Toad World Blog

Concatenated Indexes and Index column order

Dec 10, 2018 1:23:47 PM by Anju Garg

A concatenated index, also known as multi-column, composite or combined index, is one index across multiple columns. In case of queries with multiple equality conditions, even though multiple single column indexes can result in very good performance, a single concatenated index solution is preferable as it

  • saves storage space
  • saves the maintenance overhead for multiple indexes

Moreover, the order of the columns in a concatenated index, if appropriately chosen, can further save storage and improve performance, too. Let us consider a query with two equality predicates; e.g.,

Where a = <> and b = <>

Here, we have the option of creating a concatenated index on (a,b) or (b,a). Both the indexes will

  • occupy the same storage space
  • perform identically for queries with a predicate like Where a = <> and b = <>

However, if columns a and b have contrasting selectivities, the order of columns in the index will impact

  • compressibility of the index – If the leading column(s) has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.
  • the possibility of using Index Skip Scan while searching for trailing column – In general, a database can use a concatenated index when searching for the leading (leftmost) columns. For example, an index on two columns; g., (a,b) can be used when searching for
    • the first column (a = <>)
    • both the columns together (a= <> and b = <>)

However, we might be able to use a concatenated index even when searching for only the trailing column, provided that the leading column is the one having low selectivity; i.e., if we have an index on (a,b) and column a has relatively few distinct values, we might be able to use an index skip scan on a query having a predicate like: 
where b = <>; 

In this article, I will demonstrate

  • Concatenation of columns in an index
  • Use of the less selective column as the leading column in a concatenated index to enable
    • Use of Index skip scan to search for trailing column in the index
    • Compression of Index to
      • Save storage
      • Further improve the performance of a queries that need to visit a large number of index blocks

Overview

  • Create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column
    • Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs well
    • Search for the leading column cust_id only using equality predicate and note that the index is used
    • Search for the trailing column channel_id only using equality predicate and note that the index is not used
  • Drop the concatenated index on ( CUST_ID , CHANNEL_ID)
  • Create a concatenated index on (CHANNEL_ID, CUST_ID ) with  less selective column channel_id as the leading column
    • Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with the earlier index on (CUST_ID , CHANNEL_ID)
    • Search for the trailing column cust_id only using equality predicate and note that index skip scan is used
    • Search for the leading column channel_id only using equality predicate and note that the index is used
    • Compress the index on (CHANNEL_ID, CUST_ID ). Note that
      • Compression leads to savings in storage space
      • Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs as well as with uncompressed index
      • Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs as well as with uncompressed index
      • Search for the leading column channel_id only using equality predicate and note that the index is used and the query performs better than with uncompressed index

Demonstration

Create an hr.sales table which is a copy of sh.sales, but the data is ordered by channel_id, cust_id. Note that column CHANNEL_ID has four distinct values, whereas column CUST_ID has 7059 distinct values. This means that column CHANNEL_ID is less selective than CUST_ID.

SYS>drop table hr.sales purge;

    create table hr.sales as select * from sh.sales order by channel_id, cust_id;

SYS>select count(*) num_rows, count(distinct cust_id) distinct_cust_id,         

           count(distinct(cust_id))/count(*)cust_id_selectivity,

           count(distinct channel_id)distinct_channel_id,

           count(distinct(channel_id))/count(*)channel_id_selectivity

     from hr.sales;

 

  NUM_ROWS DISTINCT_CUST_ID CUST_ID_SELECTIVITY DISTINCT_CHANNEL_ID CHANNEL_ID_SELECTIVITY

---------- ---------------- ------------------- ------------------- ----------------------

    918843             7059          .007682488                   4             4.3533E-06

 

The data in column CHANNEL_ID is skewed and there are few rows for CHANNEL_ID = 9 as compared to the other values.

SYS>select distinct channel_id , count(*)

    from hr.sales

    group by channel_id order by channel_id;  

   

CHANNEL_ID   COUNT(*)

---------- ----------

         2     258025

         3     540328

         4     118416

         9       2074

 

Gather optimizer statistics for hr.sales table so that we have a histogram on column channel_id and the optimizer knows about the skewed data distribution in the column.

SYS>exec dbms_stats.gather_table_stats('HR','SALES', method_opt => 'For columns CHANNEL_ID size 4, cust_id size auto',Cascade => true, no_invalidate => false);

SYS> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM

     FROM   dba_TAB_COL_STATISTICS

     WHERE  owner = 'HR' and TABLE_NAME='SALES';

 

 TABLE_NAME      COLUMN_NAME                    NUM_DISTINCT HISTOGRAM

--------------- ------------------------------ ------------ ---------------

SALES           CUST_ID                                7059 NONE

SALES           CHANNEL_ID                                4 FREQUENCY

 

  • Let us create a concatenated index on ( CUST_ID , CHANNEL_ID) with highly selective column cust_id as the leading column.

SYS>create index hr.cust_channel_id_idx on hr.sales( cust_id, channel_id);

 

  • Search for a channel_id and cust_id using equality predicates and note that
    • The index is used to search for both channel_id and cust_id
    • Range scan of the index returns ROWID’s of rows having channel_id = 9 and CUST_ID = 100750 (Line 2)
    • Rows corresponding to returned ROWID's are read from SALES table (Line 1) and finally returned (Line 0)
    • As concatenated index on (CUST_ID , CHANNEL_ID) is highly selective; its scan (Line 2) returns a small number of rows (2), requiring  just a few (maximum 2) index blocks to be visited. The subsequent  TABLE ACCESS BY INDEX ROWID operation (Line 1) again requires visiting at most two table blocks without the need to filter out any rows.
    • The query performs well with only six consistent gets as
      • Very few index / table blocks need to be visited
      • A filter operation has been completely avoided during table access

Note that the statistics shown are from second execution to eliminate parsing and other costs.

SYS>set autotrace trace

 

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

    where s.channel_id = 9 and s.cust_id = 100750;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 867189039

------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------|   0 | SELECT STATEMENT            |                     |     1 |    21 |     4   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |     1 |    21 |     4   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CUST_CHANNEL_ID_IDX |     1 |       |     3   (0)| 00:00:01 |

------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("S"."CUST_ID"=100750 AND "S"."CHANNEL_ID"=9)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          6  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the leading column cust_id only using equality predicate and observe that that the index CUST_CHANNEL_ID_IDX has been used as expected. The query performs well with only six consistent gets.

SYS>set autot trace

 

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

    where s.cust_id = 100750;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 867189039

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |   130 |  2730 |     6   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |   130 |  2730 |     6   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CUST_CHANNEL_ID_IDX |   130 |       |     3   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CUST_ID"=100750)

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

          6  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the trailing column channel_id only using equality predicate and note that expensive Full table scan has been used, as the optimizer found it less costly as compared to using the index.

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

    where s.channel_id = 9 ;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 781590677

 

---------------------------------------------------------------------------

| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       |  2248 | 47208 |  1231   (1)| 00:00:15 |

|*  1 |  TABLE ACCESS FULL| SALES |  2248 | 47208 |  1231   (1)| 00:00:15 |

---------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("S"."CHANNEL_ID"=9)

 

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

       4574  consistent gets

       4431  physical reads

          0  redo size

      38498  bytes sent via SQL*Net to client

       1937  bytes received via SQL*Net from client

        140  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

       2074  rows processed

 

  • Drop the concatenated index on (CUST_ID , CHANNEL_ID) and create a concatenated index on (CHANNEL_ID, CUST_ID ) with the less selective column channel_id as the leading column

SYS>drop index hr.CUST_CHANNEL_ID_IDX ;

    create index hr.channel_cust_id_idx on hr.sales(channel_id, cust_id);

 

  • Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the earlier index on (CUST_ID , CHANNEL_ID)

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.channel_id = 9 and s.cust_id = 100750;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2906571686

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |     1 |    21 |     4   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |     1 |    21 |     4   (0)| 00:00:01 |

|*  2   INDEX RANGE SCAN          | CHANNEL_CUST_ID_IDX |     1 |       |     3   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          6  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the trailing column cust_id only using equality predicate and note that although cust_id is the trailing column in the index, index skip scan has been used to search for cust_id.

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.cust_id = 100750;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1839107692

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |   130 |  2730 |     7   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |   130 |  2730 |     7   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN           | CHANNEL_CUST_ID_IDX |   130 |       |     6   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CUST_ID"=100750)

       filter("S"."CUST_ID"=100750)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

         13  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the leading column channel_id only using equality predicate and note that the index is used as expected. The cost of searching a channel_id using index CHANNEL_CUST_ID_IDX is much, much lower than the cost of the FTS performed earlier. Note that physical I/O has been completely eliminated.

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.channel_id = 9 ;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2906571686

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |  2248 | 47208 |    19   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |  2248 | 47208 |    19   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CHANNEL_CUST_ID_IDX |  2248 |       |     8   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CHANNEL_ID"=9)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        298  consistent gets

          0  physical reads

          0  redo size

      38498  bytes sent via SQL*Net to client

       1937  bytes received via SQL*Net from client

        140  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

       2074  rows processed

 

Thus, it is beneficial to use the less selective channel_id as leading column in the concatenated index as it enables the index to be used even while searching for the trailing column cust_id.

Moreover, if the leading column in an index has low selectivity (relatively few distinct values), the same column values will be replicated a large number of times, resulting in higher compressibility of the index.

  • In order to check the compressibility of the index CHANNEL_CUST_ID_IDX, let us first find out number of leaf blocks in the index before compression.

SYS>select leaf_blocks from dba_indexes where index_name = 'CHANNEL_CUST_ID_IDX';

LEAF_BLOCKS

-----------

       2334

  • Let us now compress the index CHANNEL_CUST_ID_IDX and re-execute the query to find out the number of leaf blocks in the index after compression. Observe that the number of leaf blocks has considerably come down to 1442, a saving of around 38% storage space.

SYS>alter index hr.CHANNEL_CUST_ID_IDX rebuild compress;

SYS>select leaf_blocks from dba_indexes

    where index_name = 'CHANNEL_CUST_ID_IDX';

 LEAF_BLOCKS

-----------

       1442

 

Let us now examine the performance of various queries after the compression.

  • Search for a channel_id and cust_id using equality predicates and note that the index is used and the query performs equally well as with the uncompressed index. Compression has not caused any improvement in the performance, as only two rows are returned, which require visiting at most two table / index blocks only.

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.channel_id = 9 and s.cust_id = 100750;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2906571686

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |     1 |    21 |     4   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |     1 |    21 |     4   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CHANNEL_CUST_ID_IDX |     1 |       |     3   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CHANNEL_ID"=9 AND "S"."CUST_ID"=100750)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          6  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the trailing column cust_id only using equality predicate and note that index skip scan is used and the query performs equally well as with uncompressed index. Here again, compression has not caused any improvement in the performance, as only two rows are returned, which requires visiting at most two index blocks only.

SYS>set autot trace,

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.cust_id = 100750;

     set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1839107692

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |   130 |  2730 |     7   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |   130 |  2730 |     7   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN           | CHANNEL_CUST_ID_IDX |   130 |       |     6   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CUST_ID"=100750)

       filter("S"."CUST_ID"=100750)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

         13  consistent gets

          0  physical reads

          0  redo size

        600  bytes sent via SQL*Net to client

        419  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          2  rows processed

 

  • Search for the leading column channel_id only using equality predicates and note that the index is used. The compression of the index causes the query to perform slightly better (296 consistent gets) than with the uncompressed index (298 consistent gets), as a large number of index blocks need to be visited.

SYS>set autot trace

    select s.channel_id, s.cust_id, s.prod_id

    from hr.sales s

     where s.channel_id = 9 ;

 

    set autot off

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2906571686

 

---------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                     |  2248 | 47208 |    17   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| SALES               |  2248 | 47208 |    17   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CHANNEL_CUST_ID_IDX |  2248 |       |     6   (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("S"."CHANNEL_ID"=9)

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

        296  consistent gets

          0  physical reads

          0  redo size

      38498  bytes sent via SQL*Net to client

       1937  bytes received via SQL*Net from client

        140  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

       2074  rows processed

 

Conclusion

  • If a search predicate has various equality conditions Anded together, a filter operation can be completely avoided by creating a concatenated index containing all the columns in the predicate.
  • Having least selective column as the leading column in the index is beneficial as it helps in index skip scans and compressibility, thereby saving space and improving performance of queries scanning a large number of leaf blocks in the index.
  • Having the less selective column as leading column in a concatenated index enables
    • Use of Index skip scan to search for trailing column in the index
    • Compression of the Index to
      • Save storage
      • Further improve the performance of queries that need to visit a large number of index blocks

References:

 

Tags: Oracle

Anju Garg

Written by Anju Garg

Anju Garg is an Oracle Ace with over 14 years of experience in IT Industry in various roles. Since 2010, she has been involved in teaching and has trained more than a hundred DBAs from across the world in various core DBA technologies like RAC, Data guard, Performance Tuning, SQL statement tuning, Database Administration etc.

She is a regular speaker at Sangam and OTNYathra. She writes articles about Oracle and is one of the reviewers of the following book published by Pearson Oracle Problem-Solving and Troubleshooting Handbook

She is certified for :

  • Oracle 9i Database Administration OCP
  • Oracle 11g Database Administration OCP
  • Oracle 11g Performance Tuning OCE
  • Oracle 11g R2 RAC OCE
  • Oracle 11g SQL Tuning OCE
  • Oracle 12c Database Administration OCP
  • Oracle Real Application Clusters 12c Certified Implementation Specialist

She is passionate about learning and has keen interest in RAC and Performance Tuning. She shares her knowledge via her technical blog at http://oracleinaction.com/