Jul 6, 2017 3:06:01 PM by Juan Carlos Olamendy
Today it's very common to store a lot of data in our MySQL databases. Common examples are: storing tracking information about your website to do analytics later, logging facts about your business, or just recording geolocation information provided by phones. As a database grows exponentially, then some queries (even the optimized one) are getting slower.
One technique used to speed up the performance when you deal with very big tables is partitioning.
The idea behind partitioning isn't to use multiple servers to host your data but to divide up the big tables/indexes into smaller segments (logical data spaces) stored in different physical data files, so that the queries access only a fraction of the data; thus there is less for them to scan, balancing the workload and potentially making queries faster.
In this article, I'll explain the concepts behind data partitioning. Later, these principles and concepts will be illustrated using real-world examples so that you can apply this technique in your own solutions.
Conceptually, there are two types of partitioning: horizontal and vertical. In horizontal partitioning, a partition function is applied to all rows in a table in order to store the rows in different partition segments in the same node. In vertical partitioning, a partition function is applied to all rows in a table in order to distribute the data in different nodes in the cluster.
At the time of this writing, MySQL supports only horizontal partitioning.
The main points to consider when partitioning your data horizontally in MySQL are:
The key benefits of data partitioning are:
Let’s start illustrating the partitioning concepts using some real-world examples.
There are several types of partitioning in MySQL such as:
In each section, I’ll cover each partitioning type.
Range Partitioning
In this partitioning scheme, we define several ranges (using the VALUES LESS THAN operator) and each row will be assigned to one of them. In this way, the ranges should be contiguous but they cannot overlap each other.
This scheme is very useful for representing ranges of time for tracking, logging, recording transactions, etc.
Let’s use this scheme for storing transactions related to sale orders, as shown in the listing 01.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 01
You can verify that MySQL has actually created a data file for each partition by listing the testing_db database directory as shown in the listing 02.
$ ls -alh /var/lib/mysql/testing_db |
Listing 02
Note the MAXVALUE clause used in the range definition. This is not mandatory, but it works as a catch all; in this case for orders equal to or greater than the max value (year 2017).
If we don’t include the MAXVALUE clause, and if we try to insert an order for the year 2020, then it would fail, as there wouldn’t be any partition definition for that value.
You can see the utility of the YEAR function for defining partitions. We can use any function that returns an INTEGER, such as UNIX_TIMESTAMP, as shown in listing 03.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 03
We can query a table indicating a particular partition to be used, in order not to use the whole table, as shown in listing 04.
MySQL> SELECT *,'part_2015' |
Listing 04
We can also see the number of rows per partition using the following query, shown in listing 05.
MySQL> SELECT PARTITION_NAME, TABLE_ROWS |
Listing 05
We can delete the rows of a partition without affecting the rest of the dataset in the table, as shown in listing 06.
MySQL> ALTER TABLE testing_db.sales_order TRUNCATE PARTITION part_2015;
|
Listing 06
We can also drop a partition, as shown in listing 07.
MySQL> ALTER TABLE testing_db.sales_order DROP PARTITION part_2015;
|
Listing 07
Another important concept in table partitioning is data pruning. If we want to show all the orders before 2017, then MySQL optimizer will leave unnecessary partitions out of the execution plan, dramatically improving the execution time (latency).
We can check using an EXPLAIN statement to visualize the execution plan and the involved partitions (marked in bold) as shown in listing 08.
MySQL> EXPLAIN partitions SELECT * FROM testing_db.sales_order WHERE created < '2017-01-01'; |
Listing 08
This scheme is similar to RANGE, but in this case, a particular partition is selected based on columns matching one of a set of discrete values, using the VALUES IN clause where the value list is a comma-separated list of integers, as shown in listing 09.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 09
Notice that in this partitioning schema, there is no catch-all expression (like the MAXVALUE expression in RANGE). We should cover all possible elements in the criteria list to prevent insertion errors.
In this scheme, a particular partition is selected based on the value returned by a user-defined expression. The expression can be any expression that yields a nonnegative integer value. This schema is used mainly to evenly distribute data among the number of partitions.
Let’s see an example in listing 10.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 10
The hash expression can also use a function, as shown in listing 11.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 11
This scheme is similar to the HASH scheme, with the difference that the hashing function is provided by MySQL. MySQL employs its own internal hashing function, which is based on the same algorithm as PASSWORD.
If we don’t explicitly specify the partitioning columns part of the key, then MySQL will automatically use the primary key or a unique key as the partitioning column.
If no unique keys are available, the statement will fail.
Let’s see an example in listing 12.
MySQL> CREATE TABLE testing_db.sales_order ( |
Listing 12
In this article, I've explained the idea and benefits related to transportable tablespace mechanism in InnoDB as an alternative for moving large datasets between MySQL instances.
I’ve also illustrated the partitioning concepts through common real-world examples. Now you can start using this amazing feature in your own database environment to speed up your queries and scale your systems.
Tags: MySQL
Written by Juan Carlos Olamendy
CEO and Founder of Nubisera Enterprise Software Architect Expert. Advisor. Entrepreneur. Oracle ACE. Microsoft MVP. External technical consultant for Microsoft, Oracle, HP and Dell. Prolific blogger on all subjects related to software technology and entrepreneurship. Graduate degree in Computer Science Masters of Science in Business Informatics(MBA)
We use cookies to improve your experience with our site. By continuing to use this site, you consent to our use of cookies. Learn more.