Concatenated Indexes and Index column order

A concatenated index, also known as multi-column, composite or combined index, is one index across multiple columns. In case of queries with multiple equality conditions, even though multiple single column indexes can result in very good performance, a single concatenated index solution is preferable as it

saves storage space
saves the maintenance overhead for multiple indexes

Moreover, the order of the columns in a concatenated index, if appropriately chosen, can further save storage and improve performance, too. Let us consider a query with two equality predicates; e.g.,

Where a = <> and b = <>

Here, we have the option of creating a concatenated index on (a,b) or (b,a). Both the indexes will

occupy the same storage space
perform identically for queries with a predicate like Where a = <> and b = <>

However, if columns a and b have contrasting selectivities, the order of columns in the index will impact

compressibility of the index – If the leading column(s) has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.
the possibility of using Index Skip Scan while searching for trailing column – In general, a database can use a concatenated index when searching for the leading (leftmost) columns. For example, an index on two columns; g., (a,b) can be used when searching for
- the first column (a = <>)
- both the columns together (a= <> and b = <>)

However, we might be able to use a concatenated index even when searching for only the trailing column, provided that the leading column is the one having low selectivity; i.e., if we have an index on (a,b) and column a has relatively few distinct values, we might be able to use an index skip scan on a query having a predicate like:
where b = <>;

In this article, I will demonstrate

Concatenation of columns in an index
Use of the less selective column as the leading column in a concatenated index to enable
- Use of Index skip scan to search for trailing column in the index
- Compression of Index to
  - Save storage
  - Further improve the performance of a queries that need to visit a large number of index blocks

Overview

Create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs well
- Search for the leading column cust_id only using equality predicate and note that the index is used
- Search for the trailing column channel_id only using equality predicate and note that the index is not used
Drop the concatenated index on ( CUST_ID , CHANNEL_ID)
Create a concatenated index on (CHANNEL_ID, CUST_ID ) with less selective column channel_id as the leading column
- Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with the earlier index on (CUST_ID , CHANNEL_ID)
- Search for the trailing column cust_id only using equality predicate and note that index skip scan is used
- Search for the leading column channel_id only using equality predicate and note that the index is used
- Compress the index on (CHANNEL_ID, CUST_ID ). Note that
  - Compression leads to savings in storage space
  - Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with uncompressed index
  - Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs as well as with uncompressed index
  - Search for the leading column channel_id only using equality predicate and note that the index is used and the query performs better than with uncompressed index

Demonstration

Create an hr.sales table which is a copy of sh.sales, but the data is ordered by channel_id, cust_id. Note that column CHANNEL_ID has four distinct values, whereas column CUST_ID has 7059 distinct values. This means that column CHANNEL_ID is less selective than CUST_ID.

SYS>drop table hr.sales purge;

create table hr.sales as select * from sh.sales order by channel_id, cust_id;

SYS>select count(*) num_rows, count(distinct cust_id) distinct_cust_id,

count(distinct(cust_id))/count(*)cust_id_selectivity,

count(distinct channel_id)distinct_channel_id,

count(distinct(channel_id))/count(*)channel_id_selectivity

from hr.sales;

NUM_ROWS DISTINCT_CUST_ID CUST_ID_SELECTIVITY DISTINCT_CHANNEL_ID CHANNEL_ID_SELECTIVITY

———- —————- ——————- ——————- ———————-

918843 7059 .007682488 4 4.3533E-06

The data in column CHANNEL_ID is skewed and there are few rows for CHANNEL_ID = 9 as compared to the other values.

SYS>select distinct channel_id , count(*)

from hr.sales

group by channel_id order by channel_id;

CHANNEL_ID COUNT(*)

———- ———-

2 258025

3 540328

4 118416

9 2074

Gather optimizer statistics for hr.sales table so that we have a histogram on column channel_id and the optimizer knows about the skewed data distribution in the column.

SYS>exec dbms_stats.gather_table_stats('HR','SALES', method_opt => 'Forcolumns CHANNEL_ID size 4, cust_id size auto’,Cascade => true, no_invalidate => false);

SYS> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM

FROM dba_TAB_COL_STATISTICS

WHERE owner = 'HR' and TABLE_NAME='SALES';

TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM

————— —————————— ———— —————

SALES CUST_ID 7059 NONE

SALES CHANNEL_ID 4 FREQUENCY

Let us create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column.

SYS>create index hr.cust_channel_id_idx on hr.sales( cust_id, channel_id);

Search for a channel_id and cust_id using equality predicates and note that
- The index is used to search for both channel_id and cust_id
- Range scan of the index returns ROWID’s of rows having channel_id = 9 and CUST_ID = 100750 (Line 2)
- Rows corresponding to returned ROWID's are read from SALES table (Line 1) and finally returned (Line 0)
- As concatenated index on (CUST_ID , CHANNEL_ID) is highly selective; its scan (Line 2) returns a small number of rows (2), requiring just a few (maximum 2) index blocks to be visited. The subsequent TABLE ACCESS BY INDEX ROWID operation (Line 1) again requires visiting at most two table blocks without the need to filter out any rows.
- The query performs well with only six consistent gets as
  - Very few index / table blocks need to be visited
  - A filter operation has been completely avoided during table access

Note that the statistics shown are from second execution to eliminate parsing and other costs.

SYS>set autotrace trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 and s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 867189039

——————————————————————————

—————————————————————————| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CUST_CHANNEL_ID_IDX | 1 | | 3 (0)| 00:00:01 |

——————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CUST_ID"=100750 AND "S"."CHANNEL_ID"=9)

Statistics

———————————————————-

0 recursive calls

0 db block gets

6 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the leading column cust_id only using equality predicate and observe that that the index CUST_CHANNEL_ID_IDX has been used as expected. The query performs well with only six consistent gets.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 867189039

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 130 | 2730 | 6 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 6 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CUST_CHANNEL_ID_IDX| 130 | | 3 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CUST_ID"=100750)

Statistics

———————————————————-

1 recursive calls

0 db block gets

6 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the trailing column channel_id only using equality predicate and note that expensive Full table scan has been used, as the optimizer found it less costly as compared to using the index.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 ;

set autot off

Execution Plan

———————————————————-

Plan hash value: 781590677

—————————————————————————

—————————————————————————

| 0 | SELECT STATEMENT | | 2248 | 47208 | 1231 (1)| 00:00:15 |

|* 1 | TABLE ACCESS FULL| SALES | 2248 | 47208 | 1231 (1)| 00:00:15 |

—————————————————————————

Predicate Information (identified by operation id):

—————————————————

1 – filter("S"."CHANNEL_ID"=9)

Statistics

———————————————————-

0 recursive calls

0 db block gets

4574 consistent gets

4431 physical reads

0 redo size

38498 bytes sent via SQL*Net to client

1937 bytes received via SQL*Net from client

140 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2074 rows processed

Drop the concatenated index on (CUST_ID , CHANNEL_ID) and create a concatenated index on (CHANNEL_ID, CUST_ID ) with the less selective column channel_id as the leading column

SYS>drop index hr.CUST_CHANNEL_ID_IDX ;

create index hr.channel_cust_id_idx on hr.sales(channel_id, cust_id);

Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the earlier index on (CUST_ID , CHANNEL_ID)

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 and s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 2906571686

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 1 | | 3 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)

Statistics

———————————————————-

0 recursive calls

0 db block gets

6 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the trailing column cust_id only using equality predicate and note that although cust_id is the trailing column in the index, index skip scan has been used to search for cust_id.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 1839107692

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 130 | 2730 | 7 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 7 (0)| 00:00:01 |

|* 2 | INDEX SKIP SCAN | CHANNEL_CUST_ID_IDX | 130 | | 6 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CUST_ID"=100750)

filter("S"."CUST_ID"=100750)

Statistics

———————————————————-

0 recursive calls

0 db block gets

13 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the leading column channel_id only using equality predicate and note that the index is used as expected. The cost of searching a channel_id using index CHANNEL_CUST_ID_IDX is much, much lower than the cost of the FTS performed earlier. Note that physical I/O has been completely eliminated.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 ;

set autot off

Execution Plan

———————————————————-

Plan hash value: 2906571686

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 2248 | 47208 | 19 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 2248 | 47208 | 19 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 2248 | | 8 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CHANNEL_ID"=9)

Statistics

———————————————————-

0 recursive calls

0 db block gets

298 consistent gets

0 physical reads

0 redo size

38498 bytes sent via SQL*Net to client

1937 bytes received via SQL*Net from client

140 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2074 rows processed

Thus, it is beneficial to use the less selective channel_id as leading column in the concatenated index as it enables the index to be used even while searching for the trailing column cust_id.

Moreover, if the leading column in an index has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.

In order to check the compressibility of the index CHANNEL_CUST_ID_IDX, let us first find out number of leaf blocks in the index before compression.

SYS>select leaf_blocks from dba_indexes where index_name = 'CHANNEL_CUST_ID_IDX';

LEAF_BLOCKS

———–

2334

Let us now compress the index CHANNEL_CUST_ID_IDX and re-execute the query to find out the number of leaf blocks in the index after compression. Observe that the number of leaf blocks has considerably come down to 1442, a saving of around 38% storage space.

SYS>alter index hr.CHANNEL_CUST_ID_IDX rebuild compress;

SYS>select leaf_blocks from dba_indexes

where index_name = 'CHANNEL_CUST_ID_IDX';

LEAF_BLOCKS

———–

1442

Let us now examine the performance of various queries after the compression.

Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the uncompressed index. Compression has not caused any improvement in the performance, as only two rows are returned, which require visiting at most two table / index blocks only.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 and s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 2906571686

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 1 | 21 | 4 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 1 | | 3 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)

Statistics

———————————————————-

0 recursive calls

0 db block gets

6 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs equally well as with uncompressed index. Here again, compression has not caused any improvement in the performance, as only two rows are returned, which requires visiting at most two index blocks only.

SYS>set autot trace,

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.cust_id = 100750;

set autot off

Execution Plan

———————————————————-

Plan hash value: 1839107692

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 130 | 2730 | 7 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 130 | 2730 | 7 (0)| 00:00:01 |

|* 2 | INDEX SKIP SCAN | CHANNEL_CUST_ID_IDX | 130 | | 6 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CUST_ID"=100750)

filter("S"."CUST_ID"=100750)

Statistics

———————————————————-

0 recursive calls

0 db block gets

13 consistent gets

0 physical reads

0 redo size

600 bytes sent via SQL*Net to client

419 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2 rows processed

Search for the leading column channel_id only using equality predicates and note that the index is used. The compression of the index causes the query to perform slightly better (296 consistent gets) than with the uncompressed index (298 consistent gets), as a large number of index blocks need to be visited.

SYS>set autot trace

select s.channel_id, s.cust_id, s.prod_id

from hr.sales s

where s.channel_id = 9 ;

set autot off

Execution Plan

———————————————————-

Plan hash value: 2906571686

—————————————————————————————————

—————————————————————————————————

| 0 | SELECT STATEMENT | | 2248 | 47208 | 17 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| SALES | 2248 | 47208 | 17 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | CHANNEL_CUST_ID_IDX | 2248 | | 6 (0)| 00:00:01 |

—————————————————————————————————

Predicate Information (identified by operation id):

—————————————————

2 – access("S"."CHANNEL_ID"=9)

Statistics

———————————————————-

0 recursive calls

0 db block gets

296 consistent gets

0 physical reads

0 redo size

38498 bytes sent via SQL*Net to client

1937 bytes received via SQL*Net from client

140 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

2074 rows processed

Conclusion

If a search predicate has various equality conditions Anded together, a filter operation can be completely avoided by creating a concatenated index containing all the columns in the predicate.
Having least selective column as the leading column in the index is beneficial as it helps in index skip scans and compressibility, thereby saving space and improving performance of queries scanning a large number of leaf blocks in the index.
Having the less selective column as leading column in a concatenated index enables
- Use of Index skip scan to search for trailing column in the index
- Compression of the Index to
  - Save storage
  - Further improve the performance of queries that need to visit a large number of index blocks

References:

Overview

Demonstration

Conclusion

About the Author

Anju Garg

About the Author

Start the discussion at forums.toadworld.com