A concatenated index, also known as multi-column, composite or combined index, is one index across multiple columns. In case of queries with multiple equality conditions, even though multiple single column indexes can result in very good performance, a single concatenated index solution is preferable as it
- saves storage space
- saves the maintenance overhead for multiple indexes
Moreover, the order of the columns in a concatenated index, if appropriately chosen, can further save storage and improve performance, too. Let us consider a query with two equality predicates; e.g.,
Where a = <> and b = <>
Here, we have the option of creating a concatenated index on (a,b) or (b,a). Both the indexes will
- occupy the same storage space
- perform identically for queries with a predicate like Where a = <> and b = <>
However, if columns a and b have contrasting selectivities, the order of columns in the index will impact
- compressibility of the index – If the leading column(s) has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.
- the possibility of using Index Skip Scan while searching for trailing column – In general, a database can use a concatenated index when searching for the leading (leftmost) columns. For example, an index on two columns; g., (a,b) can be used when searching for
- the first column (a = <>)
- both the columns together (a= <> and b = <>)
However, we might be able to use a concatenated index even when searching for only the trailing column, provided that the leading column is the one having low selectivity; i.e., if we have an index on (a,b) and column a has relatively few distinct values, we might be able to use an index skip scan on a query having a predicate like:
where b = <>;
In this article, I will demonstrate
- Concatenation of columns in an index
- Use of the less selective column as the leading column in a concatenated index to enable
- Use of Index skip scan to search for trailing column in the index
- Compression of Index to
- Save storage
- Further improve the performance of a queries that need to visit a large number of index blocks
Overview
- Create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs well
- Search for the leading column cust_id only using equality predicate and note that the index is used
- Search for the trailing column channel_id only using equality predicate and note that the index is not used
- Drop the concatenated index on ( CUST_ID , CHANNEL_ID)
- Create a concatenated index on (CHANNEL_ID, CUST_ID ) with less selective column channel_id as the leading column
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with the earlier index on (CUST_ID , CHANNEL_ID)
- Search for the trailing column cust_id only using equality predicate and note that index skip scan is used
- Search for the leading column channel_id only using equality predicate and note that the index is used
- Compress the index on (CHANNEL_ID, CUST_ID ). Note that
- Compression leads to savings in storage space
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with uncompressed index
- Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs as well as with uncompressed index
- Search for the leading column channel_id only using equality predicate and note that the index is used and the query performs better than with uncompressed index
Demonstration
Create an hr.sales table which is a copy of sh.sales, but the data is ordered by channel_id, cust_id. Note that column CHANNEL_ID has four distinct values, whereas column CUST_ID has 7059 distinct values. This means that column CHANNEL_ID is less selective than CUST_ID.
SYS>drop table hr.sales purge;
create table hr.sales as select * from sh.sales order by channel_id, cust_id;
SYS>select count(*) num_rows, count(distinct cust_id) distinct_cust_id,
count(distinct(cust_id))/count(*)cust_id_selectivity,
count(distinct channel_id)distinct_channel_id,
count(distinct(channel_id))/count(*)channel_id_selectivity
from hr.sales;
NUM_ROWS DISTINCT_CUST_ID CUST_ID_SELECTIVITY DISTINCT_CHANNEL_ID CHANNEL_ID_SELECTIVITY
———- —————- ——————- ——————- ———————-
918843 7059 .007682488 4 4.3533E-06
The data in column CHANNEL_ID is skewed and there are few rows for CHANNEL_ID = 9 as compared to the other values.
SYS>select distinct channel_id , count(*)
from hr.sales
group by channel_id order by channel_id;
CHANNEL_ID COUNT(*)
———- ———-
2 258025
3 540328
4 118416
9 2074
Gather optimizer statistics for hr.sales table so that we have a histogram on column channel_id and the optimizer knows about the skewed data distribution in the column.
SYS>exec dbms_stats.gather_table_stats('HR','SALES', method_opt => 'Forcolumns CHANNEL_ID size 4, cust_id size auto’,Cascade => true, no_invalidate => false);
SYS> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM dba_TAB_COL_STATISTICS
WHERE owner = 'HR' and TABLE_NAME='SALES';
TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM
————— —————————— ———— —————
SALES CUST_ID 7059 NONE
SALES CHANNEL_ID 4 FREQUENCY
- Let us create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column.
SYS>create index hr.cust_channel_id_idx on hr.sales( cust_id, channel_id);
- Search for a channel_id and cust_id using equality predicates and note that
- The index is used to search for both channel_id and cust_id
- Range scan of the index returns ROWID’s of rows having channel_id = 9 and CUST_ID = 100750 (Line 2)
- Rows corresponding to returned ROWID's are read from SALES table (Line 1) and finally returned (Line 0)
- As concatenated index on (CUST_ID , CHANNEL_ID) is highly selective; its scan (Line 2) returns a small number of rows (2), requiring just a few (maximum 2) index blocks to be visited. The subsequent TABLE ACCESS BY INDEX ROWID operation (Line 1) again requires visiting at most two table blocks without the need to filter out any rows.
- The query performs well with only six consistent gets as
- Very few index / table blocks need to be visited
- A filter operation has been completely avoided during table access
Note that the statistics shown are from second execution to eliminate parsing and other costs.
SYS>set autotrace trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 and s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 867189039
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CUST_CHANNEL_ID_IDX | 1 | | 3 (0)| 00:00:01 |
——————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CUST_ID"=100750 AND "S"."CHANNEL_ID"=9)
Statistics
———————————————————-
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the leading column cust_id only using equality predicate and observe that that the index CUST_CHANNEL_ID_IDX has been used as expected. The query performs well with only six consistent gets.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 867189039
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 130 | 2730 | 6 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CUST_CHANNEL_ID_IDX| 130 | | 3 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CUST_ID"=100750)
Statistics
———————————————————-
1 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the trailing column channel_id only using equality predicate and note that expensive Full table scan has been used, as the optimizer found it less costly as compared to using the index.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 ;
set autot off
Execution Plan
———————————————————-
Plan hash value: 781590677
—————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————
| 0 | SELECT STATEMENT | | 2248 | 47208 | 1231 (1)| 00:00:15 |
|* 1 | TABLE ACCESS FULL| SALES | 2248 | 47208 | 1231 (1)| 00:00:15 |
—————————————————————————
Predicate Information (identified by operation id):
—————————————————
1 – filter("S"."CHANNEL_ID"=9)
Statistics
———————————————————-
0 recursive calls
0 db block gets
4574 consistent gets
4431 physical reads
0 redo size
38498 bytes sent via SQL*Net to client
1937 bytes received via SQL*Net from client
140 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2074 rows processed
- Drop the concatenated index on (CUST_ID , CHANNEL_ID) and create a concatenated index on (CHANNEL_ID, CUST_ID ) with the less selective column channel_id as the leading column
SYS>drop index hr.CUST_CHANNEL_ID_IDX ;
create index hr.channel_cust_id_idx on hr.sales(channel_id, cust_id);
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the earlier index on (CUST_ID , CHANNEL_ID)
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 and s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 2906571686
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 1 | | 3 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)
Statistics
———————————————————-
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the trailing column cust_id only using equality predicate and note that although cust_id is the trailing column in the index, index skip scan has been used to search for cust_id.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 1839107692
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 130 | 2730 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 7 (0)| 00:00:01 |
|* 2 | INDEX SKIP SCAN | CHANNEL_CUST_ID_IDX | 130 | | 6 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CUST_ID"=100750)
filter("S"."CUST_ID"=100750)
Statistics
———————————————————-
0 recursive calls
0 db block gets
13 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the leading column channel_id only using equality predicate and note that the index is used as expected. The cost of searching a channel_id using index CHANNEL_CUST_ID_IDX is much, much lower than the cost of the FTS performed earlier. Note that physical I/O has been completely eliminated.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 ;
set autot off
Execution Plan
———————————————————-
Plan hash value: 2906571686
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 2248 | 47208 | 19 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 2248 | 47208 | 19 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 2248 | | 8 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CHANNEL_ID"=9)
Statistics
———————————————————-
0 recursive calls
0 db block gets
298 consistent gets
0 physical reads
0 redo size
38498 bytes sent via SQL*Net to client
1937 bytes received via SQL*Net from client
140 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2074 rows processed
Thus, it is beneficial to use the less selective channel_id as leading column in the concatenated index as it enables the index to be used even while searching for the trailing column cust_id.
Moreover, if the leading column in an index has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.
- In order to check the compressibility of the index CHANNEL_CUST_ID_IDX, let us first find out number of leaf blocks in the index before compression.
SYS>select leaf_blocks from dba_indexes where index_name = 'CHANNEL_CUST_ID_IDX';
LEAF_BLOCKS
———–
2334
- Let us now compress the index CHANNEL_CUST_ID_IDX and re-execute the query to find out the number of leaf blocks in the index after compression. Observe that the number of leaf blocks has considerably come down to 1442, a saving of around 38% storage space.
SYS>alter index hr.CHANNEL_CUST_ID_IDX rebuild compress;
SYS>select leaf_blocks from dba_indexes
where index_name = 'CHANNEL_CUST_ID_IDX';
LEAF_BLOCKS
———–
1442
Let us now examine the performance of various queries after the compression.
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the uncompressed index. Compression has not caused any improvement in the performance, as only two rows are returned, which require visiting at most two table / index blocks only.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 and s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 2906571686
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 1 | | 3 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)
Statistics
———————————————————-
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs equally well as with uncompressed index. Here again, compression has not caused any improvement in the performance, as only two rows are returned, which requires visiting at most two index blocks only.
SYS>set autot trace,
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.cust_id = 100750;
set autot off
Execution Plan
———————————————————-
Plan hash value: 1839107692
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 130 | 2730 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 7 (0)| 00:00:01 |
|* 2 | INDEX SKIP SCAN | CHANNEL_CUST_ID_IDX | 130 | | 6 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CUST_ID"=100750)
filter("S"."CUST_ID"=100750)
Statistics
———————————————————-
0 recursive calls
0 db block gets
13 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
- Search for the leading column channel_id only using equality predicates and note that the index is used. The compression of the index causes the query to perform slightly better (296 consistent gets) than with the uncompressed index (298 consistent gets), as a large number of index blocks need to be visited.
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 ;
set autot off
Execution Plan
———————————————————-
Plan hash value: 2906571686
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
| 0 | SELECT STATEMENT | | 2248 | 47208 | 17 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 2248 | 47208 | 17 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 2248 | | 6 (0)| 00:00:01 |
—————————————————————————————————
Predicate Information (identified by operation id):
—————————————————
2 – access("S"."CHANNEL_ID"=9)
Statistics
———————————————————-
0 recursive calls
0 db block gets
296 consistent gets
0 physical reads
0 redo size
38498 bytes sent via SQL*Net to client
1937 bytes received via SQL*Net from client
140 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2074 rows processed
Conclusion
- If a search predicate has various equality conditions Anded together, a filter operation can be completely avoided by creating a concatenated index containing all the columns in the predicate.
- Having least selective column as the leading column in the index is beneficial as it helps in index skip scans and compressibility, thereby saving space and improving performance of queries scanning a large number of leaf blocks in the index.
- Having the less selective column as leading column in a concatenated index enables
- Use of Index skip scan to search for trailing column in the index
- Compression of the Index to
- Save storage
- Further improve the performance of queries that need to visit a large number of index blocks
References:
- https://docs.oracle.com/database/121/TGSQL/tgsql_indc.htm#TGSQL854
- https://blogs.oracle.com/sql/how-to-create-and-use-indexes-in-oracle-database
- It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?
- Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out)
- Index Column Order – Impact On Index Branch Blocks Part II
- Index Skip Scan – Does Index Column Order Matter Any More ?
Start the discussion at forums.toadworld.com