Automatic/Adaptive Dynamic Sampling in Oracle 12C

In this article you will learn how Oracle 12c dynamic sampling works. Cases where it helps the optimizer do good estimations as well as cases where it doesn’t will be presented. Only single table selectivity and group by clause will be considered in this first part of a series of articles on Adaptive Dynamic Sampling. The effect of dynamic sampling on joins will be considered in part II of the series.

1. Single table estimation

This is the query I will be using throughout this article

SELECT count(*)
FROM   customers
WHERE  cust_state_province = ‘CA’
AND    country_id=52790;

select segment_name, bytes/1024/1024 MB
from user_segments
where segment_name= 'CUSTOMERS';

SEGMENT_NAME                 MB
——————– ———-
CUSTOMERS                    13

The two columns (cust_state_province and country_id) involved in the predicate part of the above query have histograms. Below is the execution plan of this query when executed for the first time:

SQL_ID 303kpr2tf9x3j, child number 0
————————————-

——————————————————————
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1115 |   3341 |
——————————————————————

Predicate Information (identified by operation id):
—————————————————
2 – filter(("CUST_STATE_PROVINCE"='CA'AND “COUNTRY_ID”=52790))

As you can point it out there is a significant difference between actual rows and estimated ones (E-Starts*E-Rows != A-Rows).

If we execute the same query again, we will get the following execution plan:

SQL_ID 303kpr2tf9x3j, child number 1
————————————-
SELECT count(*)   FROM   sh.customers   WHERE  cust_state_province =
'CA'AND    country_id=52790

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   3341 |
——————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – filter(("CUST_STATE_PROVINCE"='CA'AND “COUNTRY_ID”=52790))
Note
—–
– statistics feedback used for this statement

The 12c Statistics feedback features (equivalent to the 11g cardinality feedback) kicks in and did a very good estimation producing a new child cursor n°1 of certainly an optimal execution plan.

This 12c Statistics feedback has been possible and successful because during the first query execution, Oracle realized that the estimations it has made are not close to what it has actually generated at run time. This is why it sets the is_reoptimizable property of child cursor n°0 to ‘Y’ as shown below:

SQL> select child_number ,is_reoptimizable
from v$sql
where sql_id= ‘303kpr2tf9x3j’;

CHILD_NUMBER I
———— –
0            Y

As such, subsequent query executions will benefit from the information gleaned by the first execution.

However, there are many real life cases where tables have no statistics at all and where dynamic statistics (dynamic sampling in 11g) kicks in. What estimations would the optimizer come up with if asked to optimize the above query against the customer table in this kind of situation?. This is what we are going to check after using the new 12c dynamic_sampling level 11 and executing the initial query again :

SQL> alter session set optimizer_dynamic_sampling =11;

SQL_ID fvdd4sy2zrdkq, child number 2
————————————-
SELECT count(*)   FROM   customers   WHERE  cust_state_province =
'CA'AND    country_id=52790

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3424 |   3341 |
——————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – filter(("CUST_STATE_PROVINCE"='CA'AND “COUNTRY_ID”=52790))
Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO).

As you can notice, a new child cursor n°2 has been created. In this case the optimizer estimations are quite good but not as perfect as it was the case with dynamic statistics (cursor n° 1). Of course changing the optimizer parameter forces the CBO to compile a new execution plan using the new optimizer environment.

Think now what would have the CBO produced as estimations if the shared pool was flushed and adequate statistics were collected including histograms and extended statistics for the columns involved into the predicate part as show below:

BEGIN
dbms_stats.gather_table_stats
('sh'
,'customers'
,method_opt=>'for all columns size skewonly
for columns(cust_state_province,country_id)
size skewonly'
);
END;

alter session set statistics_level=all;

alter system flush shared_pool;

SELECT count(*)
FROM   customers
WHERE cust_state_province = 'CA'
AND    country_id=52790;

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

SQL_ID 303kpr2tf9x3j, child number 0
————————————-

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3341 |   3341 |
——————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – filter(("CUST_STATE_PROVINCE"='CA'AND “COUNTRY_ID”=52790))

The estimations done by the CBO are in this case perfect (E-Rows=A-Rows). This is due to the help of the extended statistics collected on cust_state_province and country_id columns (see above in the call to dbms_stats package).

Let's now enable automatic dynamic sampling (ADS) and see what the CBO reaction will be:
alter system flush shared_pool;
alter session set optimizer_dynamic_sampling=11;

SELECT count(*)
FROM   customers
WHERE cust_state_province = 'CA'
AND    country_id=52790;

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

SQL_ID 303kpr2tf9x3j, child number 0
————————————-

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   3424 |   3341 |
——————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – filter(("CUST_STATE_PROVINCE"='CA'AND “COUNTRY_ID”=52790))
Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO)

One remark has to be emphasized here: despite the presence of adequate statistics when we instructed the CBO to use dynamic sampling it obeyed the instruction and has neglected the presence of adequate statistics. The Optimizer is probably thinking that with ADS activated it has more power and enough tools to do very good estimations. In my opinion, I think, however, that the Optimizer should detect the presence of the column group statistics and should use them instead of using the ADS. This image resembles the working mechanism of SQL hints; when the Optimizer is hinted to follow a certain path if this path is valid the Optimizer will follow it whatever the cost of this path.

Before digging into more details about this situation where both adequate statistics and dynamic sampling at level 11 are present, let me explain where the optimizer got this 3424 cardinality from.

As you know in 12c the CBO trace file (10053) does not contain enough information about dynamic sampling. Here is what this corresponding trace file contains:

SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for CUSTOMERS[CUSTOMERS]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
*** 2014-09-22 02:47:11.625
** Performing dynamic sampling initial checks. **
** Not using old style dynamic sampling since ADS is enabled.
Column (#11):
NewDensity:0.000144, OldDensity:0.000009 BktCnt:55500.000000, PopBktCnt:55500.000000, PopValCnt:145, NDV:145
Column (#11): CUST_STATE_PROVINCE(VARCHAR2)
AvgLen: 11 NDV: 145 Nulls: 0 Density: 0.000144
Histogram: Freq #Bkts: 145 UncompBkts: 55500 EndPtVals: 145 ActualVal: yes
Column (#13):
NewDensity:0.000676, OldDensity:0.000009 BktCnt:55500.000000, PopBktCnt:55500.000000, PopValCnt:19, NDV:19
Column (#13): COUNTRY_ID(NUMBER)
AvgLen: 5 NDV: 19 Nulls: 0 Density: 0.000676 Min: 52769.000000 Max: 52791.000000
Histogram: Freq #Bkts: 19 UncompBkts: 55500 EndPtVals: 19 ActualVal: yes
Table: CUSTOMERS Alias: CUSTOMERS
Card: Original: 55500.000000    >>Single Tab Card adjusted from 1114.870631 to 3423.832500 due to adaptive dynamic sampling
Rounded: 3424 Computed: 3423.832500 Non Adjusted: 1114.870631

Note how the trace file indicates that the adaptive dynamic sampling has been used to adjust the cardinality from 1114.870631 to 3423.832500. The following question then poses itself: how has this 3423.832500 value been determined?
When I enable sql trace (10046 events) I get the following information

SQL ID: 8z1xwcv87jxb2
Plan Hash: 2105012006
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUSTOMERS") */ 1 AS C1
FROM "SH"."CUSTOMERS"SAMPLE BLOCK(51.5796, 8) SEED(1) “CUSTOMERS” WHERE
("CUSTOMERS"."CUST_STATE_PROVINCE"='CA') AND ("CUSTOMERS"."COUNTRY_ID"=
52790)) innerQuery

call     count       cpu    elapsed disk query    current    rows
——- —— —— ———- —– —–    —–   ——-
Parse        1     0.00       0.00      0     0        0        0
Execute      1     0.00       0.00      0     0        0        0
Fetch        1     0.00       0.00      0     798      0        1
——- —— —— ———- —– ——   —–    ——
total        3     0.00       0.00      0     798      0        1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 105     (recursive depth: 1)

Rows     Row Source Operation
——- —————————————————
1        RESULT CACHE 7t2g6717d8zkq49mdxstnxqnax (cr=798 pr=0 pw=0 time=6509 us)
1        SORT AGGREGATE (cr=798 pr=0 pw=0 time=6407 us)
1766      TABLE ACCESS SAMPLE CUSTOMERS (cr=798 pr=0 pw=0 time=6236 us cost=304
size=48244 card=1723)

As you may have noticed via the apparition of the RESULT CACHE word in the above row source execution plan, the result has been stored in a cache for future executions (shared for all subsequent query executions). The Oracle sampling mechanism identifies a cardinality of 1766 as shown in the same row source execution plan. We can also point out that this sampling mechanism uses another strange argument (8) which represents the sampling group size used to sample the customers table (SAMPLE BLOCK(51.5796, 8)). This SAMPLE BLOCK (percent, sample_group_size) option is not yet documented by Oracle. It seems that when dynamic sampling is set at the new 12c level (11) then Oracle will use a sample_group_size with a value of 8. While for other dynamic sampling levels, Oracle will use a sample_group_size with a value of 1.

And, again, a new legitimate question poses itself: Where does this 51.5796 value come from? In order to estimate the customers table cardinality, Oracle actually scans half of this table (hence the 51,5796 value). This method will, of course, add an extra time penalty to the optimization part of the underlying query; but subsequent executions will benefit from this sampling estimation without the parsing penalty it requires. This method is known as the adaptive sampling method. I will prepare another article about calculating the (optimal) sample size, the exact adaptive sampling algorithm and how Oracle has implemented it.

So we can figure out that with a sampling percentage of 50% the row source cardinality found by Oracle is 1766. And Oracle, probably by guessing based on this first finding, extrapolated that the final cardinality evaluated at a 100% sampling would lead to a cardinality of (1766*100/51,57 = 3424,47)rounded to 3424.

Bear in mind that with dynamic statistics set at a level different from 11, the Optimizer counts Actual SampleSize (ASS) together with Filtered Sample Card (FSC) and then it calculates the selectivity using:

selectivity =FSC/ASS

And the estimated cardinality via the following formula:

cardinality = num_rows*selectivity

While starting from 12c, using dynamic statistics set at a the new level 11, the CBO does not calculate the selectivity of the predicate. It instead starts first by determining the optimal sample size then it continues by estimating a fraction (or a portion) of a subset of the whole result set. And finally it estimates the whole result set based on that fraction.

At this stage of the investigation a new question arises: what will happen when the size of the customers table increases?

It is naturally obvious that increasing the table size will imply a reduction of the sample size because, as always, sampling a large segment is always a time consuming operation. Will this sample size reduction imply a reduction of the cardinality estimation precision? A picture being worth a thousand words, let's explore this plausible case via a concrete example:

create table t_s as select * from dba_objects;

insert into t_s select * from t_s;

insert into t_s select * from t_s;

insert into t_s select * from t_s;

insert into t_s select * from t_s;

insert into t_s select * from t_s;

insert into t_s select * from t_s;

select segment_name, bytes/1024/1024 MB
from user_segments
where segment_name='T_S';

SEGMENT_NAME            MB
————— ———-
T_S                 776.00

exec dbms_stats.gather_table_stats('sh','t_s',
method_opt=>'for all columns size skewonly');

Against this simple model I will issue two different queries and get their execution plans using dynamic statistics set at level 11 as shown below:

First query

select count(*) from t_s where owner='SYS' and object_type='TABLE'

SQL_ID ckutv7jm28ru9, child number 0
————————————-
————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| T_S |      1 | 57840 | 82048 |
————————————————————–

Predicate Information (identified by operation id):
—————————————————
2 – filter(("OBJECT_TYPE"='TABLE' AND "OWNER"='SYS'))

Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO)

Second query

select count(*) from t_s where owner='SH' and object_type='TABLE'

SQL_ID 2w2wwxaznd0kg, child number 0
————————————-
————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| T_S |      1 |    371 |   1600 |
————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – filter(("OWNER"='SH' AND "OBJECT_TYPE"='TABLE'))
Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO)

As you can determine from the above execution plans, when I used a table (T_S) with a large segment size (776MB) Oracle was unable to get an accurate estimations using dynamic statistics set at level 11. This is in contrast to the case where I’ve used a table (CUSTOMERS) with a relatively small size (13MB) where the CBO estimation was approximately correct. In the case of the T_S table Oracle scans a very small part of this segment and then it guessed the whole query cardinality.

I want to emphasize also something else important which concerns the relationship between dynamic statistics (at level 11) and the presence of skewed data (columns with histogram) in the sampled segment. People are still wrongly thinking that the CBO, particularly the new dynamic statistics sampling method, is able to detect missing statistics (like histograms and extended statistics) and produce a correct cardinality. This,of course, is not possible using dynamic statistics set at level 11. Moreover, we saw above that even when extended statistics and adequate histograms are present, when Oracle is instructed to use dynamic statistics at level 11 it will bypass these existing dictionary statistics and use only information it can get during the execution plan compilation which obviously exclude getting histogram and extended statistics dynamically at parsing time.

As a matter of confirmation, If, for the above T_S table example, I disable automatic dynamic sampling and create extended statistics for owner and object_type columns, then the execution plan for (the second query) will look like:

SQL_ID 2w2wwxaznd0kg, child number 0
————————————-
select count(*) from t_s where owner='SH' and object_type='TABLE'

————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| T_S |      1 |   1600 |   1600 |
————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – filter(("OWNER"='SH' AND "OBJECT_TYPE"='TABLE')).

Where cardinality estimation is perfect thanks to the column group statistics on (owner, object_type).

All the above observations and investigations concern dynamic statistics at level 11; what happens now if I drop the extended columns statistics and instruct the CBO to use dynamic statistics at a lower level (9 for example)?

BEGIN
DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 't_s',
'("OWNER","OBJECT_TYPE")' );
END;
/

alter system flush shared_pool;
exec DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=9;

select count(*) from t_s where owner='SH' and object_type='TABLE'

SQL_ID avw6r9kqfvx9r, child number 0
————————————-
select count(*) from t_s where owner=’SH’and object_type=’TABLE’

————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| T_S |      1 |   1763 |   1600 |
————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – filter(("OWNER"='SH'AND “OBJECT_TYPE”=’TABLE’))
Note
—–
– dynamic statistics used: dynamic sampling (level=9)

With dynamic statistics set at a lower level (9) we get a better estimation than with the dynamic statistics set at the new 12c level (11).

If now I re-create the extended statistics for owner and object_type columns, gather statistics with histograms and instruct the CBO to use dynamic statistics at level 9, then I will end up with the following execution plan:

exec dbms_stats.gather_table_stats(‘sh’,’t_s’,method_opt=>’for all columns size skewonly for columns(owner, object_type) size skewonly’);

alter system flush shared_pool;
exec  DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=9;

select count(*) from t_s where owner=’SH’and object_type=’TABLE’

SQL_ID avw6r9kqfvx9r, child number 0
————————————-
select count(*) from t_s where owner=’SH’and object_type=’TABLE’
Plan hash value: 2004207966
————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |
|* 2 |   TABLE ACCESS FULL| T_S |      1 |   1600 |   1600 |
————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – filter(("OWNER"='SH'AND “OBJECT_TYPE”=’TABLE’))

As you can see, Oracle optimizer did not used dynamic sampling at level 9 in the presence of statistics even when instructed to do so. This means that when using optimizer_dynamic_sampling at a level different from 11, the CBO can detect the presence of accurate (or not representative) statistics and decide whether it is approprite to use or bypass the dynamic statistics feature. Of course this is not the case when dynamic statistics is set at the new level 11. In this case dynamic statistics supersedes every other available statistics and uses, unfortunately wrongly, dynamic sampling for its cardinality estimations.

2. Group By clause

I will be using the following query in order to investigate the behavior of dynamic statistics in presence of a group by clause:
SELECT count(*)
FROM sh.customers
group by COUNTRY_ID,CUST_STATE_PROVINCE;

The execution plan of this query is given by:

SQL_ID gp0cpb1kutsr9, child number 0
————————————-

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |    145 |
|   1 | HASH GROUP BY     |           |      1 |   1949 |    145 |
|   2 | TABLE ACCESS FULL| CUSTOMERS |      1 | 55500 | 55500 |
——————————————————————-

The Optimizer, having no clues about the correlation between country_id and cust_state_province, produced a wrong estimation for the HASH GROUP BY operation n°1 above.

Let’s then instruct the CBO to use the new 12c dynamic sampling level, re query again and note what will happen

alter system flush shared_pool;
exec  DBMS_RESULT_CACHE.FLUSH;
alter session set optimizer_dynamic_sampling=11;

SQL_ID gp0cpb1kutsr9, child number 0
————————————-

——————————————————————-
| Id | Operation          | Name      | Starts | E-Rows | A-Rows |
——————————————————————-
|   0 | SELECT STATEMENT   |           |      1 |        |    145 |
|   1 |  HASH GROUP BY     |           |      1 |    145 |    145 |
|   2 |   TABLE ACCESS FULL| CUSTOMERS |      1 | 55500 | 55500 |
——————————————————————-
Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO)

As you can observe in the above execution plan, HASH GROUP BY operation n°1 is now perfectly estimated by the CBO.
From the corresponding 10046 trace file we can isolate the following interesting information:

SQL ID: 7b9ma1hfgd25c
Plan Hash: 922166714
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") */ 1 AS C1 FROM (SELECT COUNT(*) “COUNT(*)
" FROM "SH"."CUSTOMERS""CUSTOMERS" GROUP BY "CUSTOMERS"."COUNTRY_ID",
"CUSTOMERS"."CUST_STATE_PROVINCE") "VW_DIS_1“) innerQuery


call     count    cpu elapsed disk  query    current   rows
——- —— —– ——- —– ——- ——- —–
Parse        1   0.00    0.00     0       0        0      0
Execute      1   0.00     0.00     0       0        0      0
Fetch        1   0.01     0.01     0    1522        0      1
——- —— —– ——- —– —— ——-   —-
total        3   0.01     0.01     0    1522        0      1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 105     (recursive depth: 1)

Rows     Row Source Operation
——- —————————————————
1 RESULT CACHE  6rw3j6xwtuzga8umg05k5624x2 (cr=1522 pr=0 pw=0 time=18273 us)
1   SORT AGGREGATE (cr=1522 pr=0 pw=0 time=18205 us)
145    VIEW  VM_NWVW_0 (cr=1522 pr=0 pw=0 time=18183 us cost=591 size=5847 card=1949)
145HASH GROUP BY (cr=1522 pr=0 pw=0 time=18156 us cost=591 size=31184 card=1949)
55500      TABLE ACCESS FULL CUSTOMERS (cr=1522 pr=0 pw=0 time=7690 us cost=586 size=888000 card=55500)

Interesting. In order to get an accurate estimation of the group by operation, Oracle executed the query and got 145. Executing the query becomes a step in the plan optimization process!!! Moreover, Oracle actually executed this query two times (you need to believe me on this, since I did not post the part of the trace file that shows this double execution: the first execution with "VW_DIS_1"and the second one visible in the 10053 trace file with "VW_DIS_2"). I still have not figured out why Oracle needs this double query execution at parse time. Generally when the first sample is not accurate then a second sampling is done, resulting into a second query execution. But in this particular case there has been no sampling (or we could consider it 100% sampling). I think that in our case the second execution is useless and doesn't need to exist.

Having said that, I want now to extend this group by testing to a larger table. For that, I created CUST table using as follows:

create table cust as select * from customers

and did several inserts (using insert into cust select * from cust) until table size reached 760MB.

SEGMENT_NAME            MB
————— ———-
CUST                 760.00

And here below is the corresponding 10046 trace file for the same group by query, but this time against the new bigger cust table

SQL ID: 5qa7dunz4udqc
Plan Hash: 3471111787
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") */ 1 AS C1 FROM (SELECT COUNT(*) “COUNT(*)
" FROM "SH"."CUST""CUST" GROUP BY "CUST"."COUNTRY_ID",
"CUST"."CUST_STATE_PROVINCE") "VW_DIS_1“) innerQuery

call     count    cpu elapsed    disk   query current   rows
——- —— —– ——- ——- ——- ——- —–
Parse        1   0.00    0.00       0       0       0      0
Execute      1   0.00    0.00       0       0       0      0
Fetch        1   1.31    1.31   64914   72386       0      0
——- —— —–   —– ——- ——- ——- —–
total        3   1.31   1.31    64914   72386       0      0

Rows     Row Source Operation
——- —————————————————
0 RESULT CACHE  8j9xaaqaqzg8q419n87dbp51vu (cr=0 pr=0 pw=0 time=1 us)
0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=31 us)
0 VIEW  VM_NWVW_0 (cr=0 pr=0 pw=0 time=27 us cost=591 size=5847 card=1949)
0 HASH GROUP BY (cr=0 pr=0 pw=0 time=26 us cost=591 size=31184 card=1949)
2524125 TABLE ACCESS FULL CUST (cr=72386 pr=64914 pw=0 time=849426 us cost=586 size=888000 card=55500)

You can easily see that this time Oracle did not fully execute the query. The Optimizer considers that this method will not be efficient because it thinks that the object is quite “big”. So it decided to use the old method to estimate the group by clause; with which it generated the following execution plan:

SQL_ID 1j4cppr9wz91r, child number 0
————————————-
SELECT count(*)   FROM   sh.cust  group by
COUNTRY_ID,CUST_STATE_PROVINCE

————————————————————–
| Id | Operation          | Name | Starts | E-Rows | A-Rows |
————————————————————–
|   0 | SELECT STATEMENT   |      |      1 |        |    145 |
|   1 |  HASH GROUP BY     |      |      1 |   1949 |    145 |
|   2 |   TABLE ACCESS FULL| CUST |      1 | 55500 |   3552K|
————————————————————–
Note
—–
– dynamic statistics used: dynamic sampling (level=AUTO)

However the above note wrongly indicates that dynamic statistics at level 11 has been used. This is not completely true because during parsing the CBO ignored this sampling method because of the huge table size; which, according to the CBO, will make sampling inefficient.
Notice also that the last 10046 trace file contains a few SQL statements that tried to estimate the cardinality of CUST table and to determine its column statistics (num_distinct etc), as reproduced below:

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel */ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUST") */ 1 AS C1 FROM
"SH"."CUST" SAMPLE BLOCK(0.826182, 8) SEED(1) "CUST") innerQuery

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel */ sum(vsize(C1))/count(*) ,
COUNT(DISTINCT C2), sum(vsize(C3))/count(*) , COUNT(DISTINCT C4)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "CUST") */
"CUST"."CUST_STATE_PROVINCE" AS C1, "CUST"."COUNTRY_ID" AS C2,
"CUST"."COUNTRY_ID" AS C3, "CUST"."CUST_STATE_PROVINCE" AS C4 FROM
"SH"."CUST" SAMPLE BLOCK(0.826182, 8) SEED(1) "CUST") innerQuery

Queries that have been ignored during plan optimization

3. Conclusion

Automatic dynamic sampling (ADS) at level 11 can be efficient and generates accurate cardinality estimations for small tables. The main principle of ADS at this particular level is the estimation of a fraction of the query (estimation of a subset of the query) and guesswork for the whole result set. Therefore with increasing table size, the CBO will definitely start to lose its estimation accuracy. This is why, in most of real-life systems, this feature will probably not be helpful; particularly if there is a significant data skew. And what seems not very convenient is that, in contrast to dynamic statistics set at a level other than 11, when the CBO is instructed to use dynamic statistics at level 11 the presence of adequate and representative statistics including histograms and extended column group statistics are ignored. In this situation, a different dynamic statistics level can be more accurate than the new ADS level. Finally, queries using a group by clause on big tables will not use dynamic sampling at level 11 even when instructed to do so and even though the note that accompanies the corresponding execution plan will tell the contrary.

Read Part 2 of this series.

1. Single table estimation

First query

Second query

2. Group By clause

3. Conclusion

About the Author

Chinar Aliyev

About the Author

Start the discussion at forums.toadworld.com