Why my execution plan has not been shared - Part IV

    May 5, 2017 1:38:00 PM by Mohamed Houri

    As mentioned in the previous three articles of the series there are sixty-four reasons for which the Oracle Optimizer might decide to hard parse a new execution plan instead of sharing the existing one. Out of these sixty-four reasons, we have already examined and explained six: load_optimizer_stats and hash_match_failed in part I, roll_invalid_mismatch and top_level_rpi_cursor in part II, and bind_mismatch and stb_obj_mismatch in part III. The current article outlines two new supplementary reasons: pq_slave_mismatch and optimizer_mismatch. The former is not very popular in real-life applications. It occurs when a parallel slave(PX) of a parallelized query refuses to share the execution plan of its Query Coordinator (QC).  The latter is, however, very popular.  It kicks in whenever an optimizer property,stored in the child cursor metadata,is modified. Oracle has foreseen thirteen sub-reasons for an optimizer parameter mismatch, which we will try to examine in this article.

     

    PQ_SLAVE_MISMATCH

    Oracle defines this reason as follows:

    (Y|N) Top-level slave decides not to share cursor

    To say the least, Oracle has not made a big effort to define this reason with a great precision. As always, the best way to make its proper opinion about this reason for non-sharing is to figure out how to force it.  Here’s how I have managed to produce this reason in Oracle version 12.1.0.1.0; first the model and then the query:

    create table t1
    as
    with generator as (
         select --+ materialize
            rownum id
         from dual
         connect by
            level <= 1000
    )
    select
        trunc((rownum-1)/2) product_id
       ,mod(rownum-1,2)     flag1
       ,mod(rownum-1,3)     flag2
       ,rownum              n1
       ,lpad(rownum,30)     v1
    from
       generator v1
      ,generator v2
    where
       rownum <= 1e6;
      
    create table t2
    as select
         level id
        ,date '2012-06-07' + mod((level-1)*5,10) + interval '5' minute start_date
        ,rpad('xx',10) padding
    from dual
    connect by level <= 1e6;
     
    begin
       dbms_stats.gather_table_stats(user, 't1');
       dbms_stats.gather_table_stats(user, 't2');
    end;
    /

    The above script creates two simple heap tables and gathers their statistics without histogram. Let's run the following query, get its execution plan, and check its execution statistics:

    with got_my_dates as
     (select /*+ materialize */
          id
         ,padding
         ,start_date
      from t2
      where start_date > to_date('07/06/2012','dd/mm/yyyy')
      and  start_date <= to_date('10/06/2012','dd/mm/yyyy')
      )
    select
      /*+ parallel(4) */
       t1.*
      ,cte.padding
      ,cte.start_date
    from
       t1
      ,got_my_dates cte
    where
        t1.product_id = cte.id
    and t1.product_id <= 1e3;
     
    SQL_ID  c5y75njp34rmf, child number 0
    -----------------------------------------------------------------------------------------
    | Id  | Operation                  | Name                       | Rows  |    TQ  |IN-OUT|
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT           |                            |       |        |    |
    |   1 |  TEMP TABLE TRANSFORMATION |                            |       |        |    |
    |   2 |   PX COORDINATOR           |                            |       |        |    |
    |   3 |    PX SEND QC (RANDOM)     | :TQ10000                   |   998K|  Q1,00 | P->S   |
    |   4 |     LOAD AS SELECT         |                            |       |  Q1,00 | PCWP       |
    |   5 |      PX BLOCK ITERATOR     |                            |   998K|  Q1,00 | PCWC   |
    |*  6 |       TABLE ACCESS FULL    | T2                         |   998K|  Q1,00 | PCWP   |
    |   7 |   PX COORDINATOR           |                            |       |        |    |
    |   8 |    PX SEND QC (RANDOM)     | :TQ20001                   |  1723 |  Q2,01 | P->S  |
    |*  9 |     HASH JOIN              |                            |  1723 |  Q2,01 | PCWP  |
    |  10 |      JOIN FILTER CREATE    | :BF0000                    |  1723 |  Q2,01 | PCWP  |
    |  11 |       PX RECEIVE           |                            |  1723 |  Q2,01 | PCWP  |
    |  12 |        PX SEND BROADCAST   | :TQ20000                   |  1723 |  Q2,00 | P->P  |
    |  13 |         PX BLOCK ITERATOR  |                            |  1723 |  Q2,00 | PCWC  |
    |* 14 |          TABLE ACCESS FULL | T1                         |  1723 |  Q2,00 | PCWP  |
    |* 15 |      VIEW                  |                            |   998K|  Q2,01 | PCWP   |
    |  16 |       JOIN FILTER USE      | :BF0000                    |   998K|  Q2,01 | PCWP   |
    |  17 |        PX BLOCK ITERATOR   |                            |   998K|  Q2,01 | PCWC   |
    |* 18 |         TABLE ACCESS FULL  | SYS_TEMP_0FD9D6628_1E139EE |   998K|  Q2,01 | PCWP   |
    -----------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       6 - access(:Z>=:Z AND :Z<=:Z)
           filter(("START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
                  AND "START_DATE"<=TO_DATE(' 2012-06-1000:00:00', 'syyyy-mm-dd hh24:mi:ss')))
       9 - access("T1"."PRODUCT_ID"="CTE"."ID")
      14 - access(:Z>=:Z AND :Z<=:Z)
           filter("T1"."PRODUCT_ID"<=1000)
      15 - filter("CTE"."ID"<=1000)
      18 - access(:Z>=:Z AND :Z<=:Z)
           filter(SYS_OP_BLOOM_FILTER(:BF0000,"C0"))
    Note
    -----

       - Degree of Parallelism is 4 because of hint

     As you can see, this query has been run parallely with a Degree of Parallelism of 4 as indicated by the Note at the bottom of its execution plan. But interestingly, it produced two execution plans (child number 0 and 1) at its very first execution, as shown below:

    SQL> @gv$sql
    Enter value for sql_id: c5y75njp34rmf
    SQL_ID        CHILD_NUMBER FIRST_LOAD_TIME     LAST_LOAD_TIME      EXECUTIONS  END_FETCH
    ------------- ------------ ------------------- ------------------- ---------- ----------
    c5y75njp34rmf            0 2017-04-17/10:10:55 2017-04-17/10:10:55          1          1
    c5y75njp34rmf            1 2017-04-17/10:10:55 2017-04-17/10:10:55          0          0

    According to the end_of_fetch_count column, it is the child number 0 that has been effectively used to run this query. Let's see now why Oracle has decided to compile two execution plans, using Tanel Poder’s nonshared script.

    SQL> @nonshared c5y75njp34rmf

    Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

    SQL_ID                        : c5y75njp34rmf
    ADDRESS                       : 00007FFC9157FAF0
    CHILD_ADDRESS                 : 00007FFC9157E640
    CHILD_NUMBER                  : 0
    REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>9</ID>
                                   <reason>PQSlavemismatch(5)</reason><size>2x4</size>
                                   <ctxpq_StmtId_pqctx>3455170</ctxpq_StmtId_pqctx>
                                   <fxu_kxfxutqidb>3389067</fxu_kxfxutqidb></ChildNode>
    CON_ID                        : 1
    -----------------
    SQL_ID                        : c5y75njp34rmf
    ADDRESS                       : 00007FFC9157FAF0
    CHILD_ADDRESS                 : 00007FFC9152B2C8
    CHILD_NUMBER                  : 1
    PQ_SLAVE_MISMATCH             : Y
    REASON                        : <ChildNode><ChildNumber>1</ChildNumber><ID>9</ID>
                                   <reason>PQSlavemismatch(5)</reason><size>2x4</size>
                                   <ctxpq_StmtId_pqctx>3389067</ctxpq_StmtId_pqctx>
                                   <fxu_kxfxutqidb>3455170</fxu_kxfxutqidb></ChildNode>
    CON_ID                        : 1
    -----------------

    As you can see, we have finally simulated this reason. Simply put, we can say that it happens inevitably with a parallel query and during which one of the parallel server refuses to share the execution plan of its query coordinator. In the model that I have engineered in this article it seems that this non-sharing reason is due to the materialization of the CTE (Common Table Expression, aka the WITH clause) which I have forced with the hint /*+ materialize */. Bear in mind, however, that I have been playing with the same model into different Oracle releases without always being able to reproduce this pq_slave_mismatch reason.

     

    OPTIMIZER_MISMATCH

    Oracle defines this reason as follows:

    (Y|N) The optimizer environment does not match the existing child cursor

    When Oracle compiles a new execution plan (child cursor) for a parent cursor it will store information that identifies as fully as possible the environment characterizing the compilation and the execution of this execution plan.  The optimizer parameters represent an important part of this stored cursor metadata. When the same parent cursor is launched again, its existing child cursor will be shared and reused provided all of its meta-ata optimizer parameters remain intact. If, however, Oracle realizes that one or many values of those stored optimizer parameters change, it will then invalidate the existing child cursor and hard parse a new one. It seems, too, that Oracle has implemented thirteen optimizer mismatch situations:

    • Optimizermismatch(1)
    • Optimizermismatch(2)
    • Optimizermismatch(3)
    • Optimizermismatch(4)  → points to a px_mismatch for serial plan (not clear)
    • Optimizermismatch(5)
    • Optimizermismatch(6)
    • Optimizermismatch(7)
    • Optimizermismatch(8)
    • Optimizermismatch(9)
    • Optimizermismatch(10) → optimizer mode (first_rows, all_rows) mismatch
    • Optimizermismatch(11)
    • Optimizermismatch(12) → optimizer parameter mismatch
    • Optimizermismatch(13) → cardinality feedback

    The best and easiest way to decipher the meaning of those 13 situations is to run the following query in different running systems and get details of the corresponding non-sharing reason using a Tanel Poder script:

    SQL> select distinct sql_id
        from gv$sql_shared_cursor
        where reason like '%Optimizer mismatch%'
        order by sql_id;
     
    SQL_ID
    -------------
    00ajuu5902429
    00zqy3yd0r3p3
    04kug40zbu4dm
    0b0wj2ykgnnzg
    0fr8zhn4ymu3v
    0gx3b09qrx9f5
    0m78skf1mudnb
    0v3dvmc22qnam
    10s3r3f17ccu3
    SQL> @nonshared <sql_id>

    Proceeding as such I realized that Optimizermismatch(13) is linked with cardinality feedback, 12 with a difference in the optimizer parameters, 10 with a difference in the optimizer mode, 4 with a not yet very clear px_mismatch linked with serial queries, and so on, as shown below:

    SQL> @nonshared 00ajuu5902429

    Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

    SQL_ID                        : 00ajuu5902429
    ADDRESS                       : 00007FFC9F7DC5D8
    CHILD_ADDRESS                 : 00007FFC9F7DB128
    CHILD_NUMBER                  : 0
    REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>3</ID>                                            
                                    <reason>Optimizermismatch(10)</reason>
                                    <size>3x4</size><optimizer_mode_hinted_cursor>0
                                    </optimizer_mode_hinted_cursor> <optimizer_mode_cursor>4
                                    </optimizer_mode_cursor>                                                
                                    <optimizer_mode_current>1</optimizer_mode_current>
                                    </ChildNode>
    CON_ID                        : 1
    -----------------
    SQL_ID                        : 00ajuu5902429
    ADDRESS                       : 00007FFC9F7DC5D8
    CHILD_ADDRESS                 : 00007FFC97D79000
    CHILD_NUMBER                  : 1
    OPTIMIZER_MODE_MISMATCH       : Y
    REASON                        : <ChildNode><ChildNumber>1</ChildNumber><ID>3</ID>                                        
                                    <reason>Optimizermismatch(10)</reason>
                                    <size>3x4</size> <optimizer_mode_hinted_cursor>0
                                    </optimizer_mode_hinted_cursor>
                                    <optimizer_mode_cursor>1</optimizer_mode_cursor                              
                                    <optimizer_mode_current>4</optimizer_mode_current>
                                    </ChildNode>
    CON_ID                        : 1
    -----------------
    SQL> @nonshared gd90ygn1j4026

    Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

    SQL_ID                        : gd90ygn1j4026
    ADDRESS                       : 00007FFCA113DC38
    CHILD_ADDRESS                 : 00007FFCA113C788
    CHILD_NUMBER                  : 0
    REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>3</ID>
                                    <reason>Optimizermismatch(12)</reason>
                                    <size>2x312</size>
                                    <_parallel_syspls_obey_force> true  false
                                    </_parallel_syspls_obey_force></ChildNode>
    CON_ID                        : 1
    -----------------
    SQL_ID                        : gd90ygn1j4026
    ADDRESS                       : 00007FFCA113DC38
    CHILD_ADDRESS                 : 00007FFC9AF3DDD0
    CHILD_NUMBER                  : 1
    OPTIMIZER_MISMATCH            : Y
    REASON                        : <ChildNode><ChildNumber>1</ChildNumber><ID>3</ID>                                          
                                    <reason>Optimizermismatch(12)</reason>
                                    <size>2x312</size>
                                    <_parallel_syspls_obey_force> false  true
                                    </_parallel_syspls_obey_force></ChildNode>
    CON_ID                        : 1
    -----------------

    What we should essentially remember here is that whatever the value x in Optimizermismatch(x), it inevitably indicates that a mismatch has been found in the optimizer parameters stored in the existing child cursor metadata. This optimizer parameter difference will almost always force Oracle to create a new child cursor with new corresponding metadata and store the non-sharing reason into the dedicated gv$sql_shared_cursor view.

     

    SUMMARY

    In this article we examined two new reasons from gv$sql_shared_cursor view: pq_slave_mismatch and optimizer_mismatch. Generally, watch out for the former when you are running a parallel query. We have shown an example of this non-sharing reason occurring when a parallel slave refuses to share the execution plan of its query coordinator because of the materialization of a subquery factoring. The latter is a very common reason in real-life applications. It happens whenever Oracle optimizer fails to find a hundred percent matching metadata of the current child cursor with that of the existing one(s).

     

    Tags: Oracle

    Mohamed Houri

    Written by Mohamed Houri