Nov 5, 2018 9:35:17 AM by Anju Garg
Performance of a query having multiple equality type predicates improves as it switches from using the index on an unselective column to a selective one.
It is very common to issue queries with multiple equality predicates, such as
Where col1 = <> and col2 = <> and …
It is highly likely that the columns being referenced in the predicates have different selectivities; i.e, whereas some (highly unselective) columns have very few distinct values, others (more selective) have a large number of distinct values. Some columns with the highest selectivity may have a unique constraint defined so that each row has a distinct value.
In such a scenario, if we have the option to create single column B-tree index, which index will perform better: the index on a column with more unique data (more selective) or more duplicate data (less selective)?
In this article I will explore above scenario by means of a simple query having two predicates referencing columns that have contrasting selectivities.
I will create a table hr.sales which is copy of sh.sales having data ordered by columns channel_id, cust_id. The column CHANNEL_ID has four distinct values, whereas column CUST_ID has 7059 distinct values. This means that an index on CHANNEL_ID will be less selective than that on CUST_ID.
We will issue a query having two equality predicates; e.g.,
Where CHANNEL_ID = < > And CUST_ID = < >
Subsequently we will examine performance of the query if we have
SYS>drop table hr.sales purge;
create table hr.sales as select * from sh.sales
order by channel_id, cust_id;
SYS>select count(*) num_rows, count(distinct cust_id) distinct_cust_id,
count(distinct(cust_id))/count(*)cust_id_selectivity,
count(distinct channel_id)distinct_channel_id,
count(distinct(channel_id))/count(*)channel_id_selectivity
from hr.sales;
NUM_ROWS DISTINCT_CUST_ID CUST_ID_SELECTIVITY DISTINCT_CHANNEL_ID CHANNEL_ID_SELECTIVITY
---------- ---------------- ------------------- ------------------- ----------------------
918843 7059 .007682488 4 4.3533E-06
The data in the column channel_id is skewed and there are few rows for channel_id = 9 as compared to the other values; i.e., the index on channel_id is selective with respect to certain values, whereas it is unselective with respect to others.
SYS>select distinct channel_id , count(*)
from hr.sales
group by channel_id order by channel_id;
CHANNEL_ID COUNT(*)
---------- ----------
2 258025
3 540328
4 118416
9 2074
SYS>exec dbms_stats.gather_table_stats('HR','SALES', method_opt => 'For columns CHANNEL_ID size 4, cust_id size auto',Cascade => true, no_invalidate => false);
SYS>SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM dba_TAB_COL_STATISTICS
WHERE owner = 'HR' and TABLE_NAME='SALES';
TABLE_NAME COLUMN_NAME NUM_DISTINCT HISTOGRAM
--------------- ------------------------------ ------------ ---------------
SALES PROD_ID 72 NONE
SALES CUST_ID 7059 NONE
SALES TIME_ID 1460 NONE
SALES CHANNEL_ID 4 FREQUENCY
SALES PROMO_ID 4 NONE
SALES QUANTITY_SOLD 1 NONE
SALES AMOUNT_SOLD 3586 NONE
7 rows selected.
SYS>Create index hr.channel_id_idx on hr.sales(channel_id);
SYS>select distinct channel_id , count(*)
from hr.sales
group by channel_id order by channel_id;
CHANNEL_ID COUNT(*)
---------- ----------
2 258025
3 540328
4 118416
9 2074
SYS>select count(*) from hr.sales where cust_id = 100750;
COUNT(*)
----------
2
Now, I will perform a lookup for CHANNEL_ID = 9 and CUST_ID = 100750 to check the index usage and the cost of execution. Note that the statistics shown are from the second execution to eliminate parsing and other costs.
SYS> set autotrace traceonly
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 and s.cust_id = 100750;
set autot off
Execution Plan
----------------------------------------------------------
Plan hash value: 3317832025
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 18 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 18 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CHANNEL_ID_IDX | 2248 | | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("S"."CUST_ID"=100750)
2 - access("S"."CHANNEL_ID"=9)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
20 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
Observe that
It is worth mentioning here that the rows and cost shown in the execution plan are optimizer estimates. These might vary from the actual figures. For example, there are 2074 rows having channel_id = 9 but the optimizer estimates 2248 rows to be returned.
SYS> Drop index hr.channel_id_idx;
Create index hr.cust_id_idx on hr.sales(cust_id);
SYS>set autot trace
select s.channel_id, s.cust_id, s.prod_id
from hr.sales s
where s.channel_id = 9 and s.cust_id = 100750;
set autot off
Execution Plan
----------------------------------------------------------
Plan hash value: 2826200558
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 21 | 6 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | CUST_ID_IDX | 130 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("S"."CHANNEL_ID"=9)
2 - access("S"."CUST_ID"=100750)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
600 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
Observe that
Again note that the rows and cost shown in the execution plan, which are optimizer estimates, differ from the actual figures. There are 2 rows having CUST_ID =100750 but optimizer estimates 130 rows to be returned.
Hence, we saw that in the case of a query having multiple equality predicates, using the index on the column with the higher selectivity (CUST_ID) causes
Thus, use of a selective index reduces I/O and hence cost of the query. Performance of a query having multiple equality type predicates improves as it switches from using the index on an unselective column (CHANNEL_ID) to a selective one (CUST_ID).
Hence, in the case of a query having multiple equality predicates, it is advisable to create an index on the more selective column referenced in the predicate.
In the next article, I will explore the use of concatenated indexes to improve the performance of queries having multiple equality type predicates.
Tags: Oracle
Written by Anju Garg
Anju Garg is an Oracle Ace with over 14 years of experience in IT Industry in various roles. Since 2010, she has been involved in teaching and has trained more than a hundred DBAs from across the world in various core DBA technologies like RAC, Data guard, Performance Tuning, SQL statement tuning, Database Administration etc.
She is a regular speaker at Sangam and OTNYathra. She writes articles about Oracle and is one of the reviewers of the following book published by Pearson Oracle Problem-Solving and Troubleshooting Handbook
She is certified for :
She is passionate about learning and has keen interest in RAC and Performance Tuning. She shares her knowledge via her technical blog at http://oracleinaction.com/
We use cookies to improve your experience with our site. By continuing to use this site, you consent to our use of cookies. Learn more.