Introduction
A covering index is a special case of an index in InnoDB where all required fields for a query are included in the index; in other words, the index itself contains the required data to execute the queries without having to execute additional reads.
Notice that in InnoDB the data is stored clustered using the primary key index; in other words, the data is physically stored as a B+tree through the primary key index. Therefore, the primary key is included in the leaf node of all secondary indexes in order to establish a reference from the secondary indexes to the actual data row.
So, any lookup involving a secondary index must navigate starting from root node through the branch nodes to the correct leaf node to take the primary key value, and then execute a random IO read on the primary key index (once again navigating from the root node through the branch nodes to the correct leaf node) to get the data row.
Using a covering index, we avoid this extra random IO read on the primary key index (one B+tree less to traverse, avoiding expensive IO operations) to get the data row because all fields required by the query are included in the covering index.
In this article, I'll explain the concepts related to covering indexes using real world scenarios and how we can really take advantage of them to improve the performance of our enterprise solutions.
When to use covering indexes
The principal scenarios to use covering indexes are:
- To serve queries without additional IO reads on big tables. Sometimes, even avoiding extra physical random IO reads (the most expensive IOoperation).
We can use this particular index for queries that require:
- filtering rows matching a certain condition (WHERE clauses)
- grouping data (GROUP BY clauses)
- sorting data in the same order of the covering index (ORDER BY clauses)
- projecting data (SELECT clauses)
Demo
Let’s suppose that we have the following big table as shown in the listing 01.
CREATE TABLE big_table( |
Listing 01
Let’s suppose that we have the following query in our application as shown in listing 02. This is a sample query of one real-world scenario that I faced several months ago.
mysql> SELECT sum(field04) |
Listing 02
When the table is very big, a query like this can spend several seconds or even minutes to execute.
If we want to optimize this query, we can create a covering index for it, so there’s no need to hit the big table to fetch the rows; just fetching the data fields from the index it-self. This approach can speed up some of our queries by an order of magnitude.
In order to create a covering index for this query, the index must cover the WHERE/GROUP BY and SELECT clauses as shown in the listing 03.
mysql> ALTER TABLE big_table |
Listing 03
It's important to choose the correct order of the columns in the index to serve query correctly. The general rule is to choose the columns for filtering first (WHERE clause with equality conditions), then sorting/grouping (ORDER BY and GROUP BY clauses) and finally the data projection (SELECT clause).
Now when we ask for the execution path of the query using the EXPLAIN sentence, we can verify if InnoDB is really using the covering index by checking the “Extra” field.
For this case, we actually see the "Using Index” value in the “Extra” field, which means that InnoDB executes the query using the former index alone (see listing 03), so it’s confirmed as a covering index. The EXPLAIN sentence is shown in the listing 04.
mysql> EXPLAIN SELECT sum(field04) |
Listing 04
Notice that all secondary indexes in InnoDB automatically include the primary key values, so in some scenarios, they’re automatically covering indexes.
Let's suppose we have the following query as shown in the listing 05.
mysql> SELECT count(id) |
Listing 05
Now let's add an index on the `field05` column as shown below as shown in the listing 06.
mysql> ALTER TABLE big_table |
Listing 06
We can verify if this index is also a covering index, by executing the EXPLAIN statement as shown in the listing 07, and again checking if the “Extra” field has the “Using Index” value. In this case, it’s a covering index because the primary key is automatically part of the secondary index in InnoDB, so the `id`field is automatically part of the index in the right part. So, it’s a covering index for query in listing 05.
mysql> EXPLAIN SELECT count(id) |
Listing 07
Another common scenario where we can use a covering index is when the query is returning useless information. Let’s suppose that an ORM generates a query as shown in the listing 08.
mysql> SELECT * |
Listing 08
But in reality we only need to return data from the `field01` field. We can improve the response time significantly by re-writing the query as shown in the listing 09 and creating a covering index for it as shown in the listing 10.
mysql> SELECT field01 |
Listing 09
mysql> ALTER TABLE big_table |
Listing 10
Finally, let me say that although covering index can speed up our queries by an order of magnitude, because the index contains all the required data for executing the query, we also need to consider some drawbacks:
- Covering indexes duplicate data of the original table, so if the table is already large, the index will be also. In this case, space usage might become an issue.
- Covering indexes are effective as long as the queries don’t add additional columns (for projections, calculations, filtering, sorting, etc). This point might become an issue when the queries change quickly in the near future and MySQL decides to stop using the current covering indexes.
Conclusion
In this article, I've explained the concepts related to covering indexes in InnoDB. I’ve also illustrated the ideas through common real-world scenarios. Now you can analyze the queries of your applications and the underlying involved workload to see where covering indexes fits well to boost the performance of your enterprise solutions.
Start the discussion at forums.toadworld.com