Quantcast
Viewing all articles
Browse latest Browse all 14

How to fix the skip scan bug in 18c

A recent Twitter thread had an interesting test case of the optimizer choosing sub-optimal skip scan over range scan. I’ve checked what’s going on, and it happens to be a known bug.

Here’s a slightly simplified test case:

drop table as_big_lookup_table cascade constraints purge;
create table as_big_lookup_table
as
select
  rownum pk_col,
  mod(rownum,30) type_col,
  mod(rownum,5000) join_col,
  rpad(‘x’,30,’x’) padding_col
from dual
connect by rownum <=10000 / create index as_big_lookup_table_good on as_big_lookup_table (type_col, join_col); create index as_big_lookup_table_bad on as_big_lookup_table (pk_col, type_col, join_col); explain plan for select pk_col, join_col from as_big_lookup_table where type_col = 10 and join_col = 10; select * from table(dbms_xplan.display(format=>’basic +rows +cost +predicate’));

We have a table with two similar indexes. One is a perfect match for the table access by (type_col, join_col) columns, and another is quite bad as the leading column is (essentially) primary key. But when we query the table by those two columns, for some reason Optimizer chooses to skip scan the bad index. What is going on?

Let’s compare the three plans:

  1. Bad plan in 18c
  2. Good plan in 18c – forced with an index_rs_asc hint
  3. Bad plan in 12.1.0.2 – forced with an index_ss hint
————————————————————————-
| Id  | Operation        | Name                    | Rows  | Cost (%CPU)|
————————————————————————-
|   0 | SELECT STATEMENT |                         |     1 |     2   (0)|
|*  1 |  INDEX SKIP SCAN | AS_BIG_LOOKUP_TABLE_BAD |     1 |     2   (0)|
————————————————————————-
Predicate Information (identified by operation id):
—————————————————
   1 – access("TYPE_COL"=10 AND "JOIN_COL"=10)
       filter("JOIN_COL"=10 AND "TYPE_COL"=10)
—
———————————————————————————————
| Id  | Operation                           | Name                     | Rows  | Cost (%CPU)|
———————————————————————————————
|   0 | SELECT STATEMENT                    |                          |     1 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE      |     1 |     2   (0)|
|*  2 |   INDEX RANGE SCAN                  | AS_BIG_LOOKUP_TABLE_GOOD |     1 |     1   (0)|
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
   2 – access("TYPE_COL"=10 AND "JOIN_COL"=10)
—
————————————————————————-
| Id  | Operation        | Name                    | Rows  | Cost (%CPU)|
————————————————————————-
|   0 | SELECT STATEMENT |                         |     1 |    32   (0)|
|*  1 |  INDEX SKIP SCAN | AS_BIG_LOOKUP_TABLE_BAD |     1 |    32   (0)|
————————————————————————-
Predicate Information (identified by operation id):
—————————————————
   1 – access("TYPE_COL"=10 AND "JOIN_COL"=10)
       filter("JOIN_COL"=10 AND "TYPE_COL"=10)

Comparing the costs of the first two plans, we see that they are identical, and CBO preferred the bad one. I thought it is done so due to alphabetical order, but no, it doesn’t seem so.

Comparing the costs of the first and third plans, we see that the skip scan cost is more reasonable in 12.1.0.2. It is close to the number of leaf blocks of the bad index.

Since we have a reproducible case, we can now compare 10053 parts of the first and third plans, and see if there’s an explanation for bad costing in 18c.

— 18.3
 ****** Costing Index AS_BIG_LOOKUP_TABLE_BAD
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SKIP_SCAN
  ColGroup Usage:: PredCnt: 2  Matches Full: #2  Partial:  Sel: 1.0000e-04
  Estimated selectivity: 0.033333 , col: #2
  Estimated selectivity: 2.0000e-04 , col: #3
  Access Path: index (skip-scan)
    SS scan sel: 1.0000e-04  SS filter sel: 1.0000e-04  ANDV (#skips): 1.000000
    SS io: 1.000000 vs. table scan io: 21.000000
    Skip Scan chosen
  Access Path: index (SkipScan)
    Index: AS_BIG_LOOKUP_TABLE_BAD
    resc_io: 2.000000  resc_cpu: 14443
    ix_sel: 1.0000e-04  ix_sel_with_filters: 1.0000e-04
    Cost: 2.000370  Resp: 2.000370  Degree: 1
— 12.1.0.2
 ****** Costing Index AS_BIG_LOOKUP_TABLE_BAD
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SKIP_SCAN
  Access Path: index (skip-scan)
    SS scan sel: 0.033400  SS filter sel: 0.033400  ANDV (#skips): 10000.000000
    SS io: 31.000000 vs. table scan io: 21.000000
    Skip Scan chosen
  Access Path: index (SkipScan)
    Index: AS_BIG_LOOKUP_TABLE_BAD
    resc_io: 32.000000  resc_cpu: 294686
    ix_sel: 0.033400  ix_sel_with_filters: 0.033400
    Cost: 32.029034  Resp: 32.029034  Degree: 1
  Best:: AccessPath: IndexRange
  Index: AS_BIG_LOOKUP_TABLE_BAD
         Cost: 32.029034  Degree: 1  Resp: 32.029034  Card: 334.000000  Bytes: 0.000000

Without knowing what those numbers mean, we can see the difference in the #skips – the number of times CBO estimates skip scan to do a range scan. In 18.3 it is 1 for some reason, and that error makes the bad index look just like the proper two-column index. It also seems like this is the result of SS selectivity miscalculation: 0.0001 vs 0.033400.

Is there a way to guess which bug is responsible for the skip scan costing change? Usually yes, and v$session_fix_control allows you to do that. I have a simple script called bug.sql to search it by bug descriptions, release or bug number. In this case, searching for “skip scan” shows a few hits:

SQL> @bug "skip scan"
                                                                                                                           Opt
     BUGNO VALUE SQL_FEATURE                              DESCRIPTION                                                      features
———- —– —————————————- —————————————————————- ———-
   6070954     1 QKSFM_ACCESS_PATH_6070954                No skip scan with contiguous leading equality index keys         10.2.0.4
   5714944     1 QKSFM_ACCESS_PATH_5714944                set IO cost for index skip scan to at least 1.0                  10.2.0.5
   7272039     1 QKSFM_ACCESS_PATH_7272039                use index cost adj when comparing skip scan with full table scan 10.2.0.5
   6086930     1 QKSFM_ACCESS_PATH_6086930                correct skip scan selectivity evaluation for BETWEEN predicate   11.2.0.2
   7277732     1 QKSFM_CBO_7277732                        allow skip scan costing for NL with non-join predicate           11.2.0.2
   8855396     1 QKSFM_ACCESS_PATH_8855396                sanity check for skip scan costing                               11.2.0.2
   8893626     1 QKSFM_ACCESS_PATH_8893626                apply index filter selectivity during skip scan costing          11.2.0.2
   9195582     1 QKSFM_ACCESS_PATH_9195582                leaf blocks as upper limit for skip scan blocks                  11.2.0.2
   9227576     1 QKSFM_CBO_9227576                        allow skip scan costing for semi/anti-join                       11.2.0.3
  10080014     1 QKSFM_CBO_10080014                       allow skip scan costing for PRIOR join in CONNECT BY query       11.2.0.3
  12839247     1 QKSFM_ACCESS_PATH_12839247               improve cost estimate for skip scan                              11.2.0.4
  13362020     1 QKSFM_CARDINALITY_13362020               fix selectivity for skip scan filter with multi column stats     11.2.0.4
  14107333     1 QKSFM_INDEX_SS_14107333                  improve cost estimate for skip scan due to last col              11.2.0.4
  14254052     1 QKSFM_CARDINALITY_14254052               amend accounting for nulls in skip scan selectivity calculation  11.2.0.4
  14254795     1 QKSFM_CARDINALITY_14254795               Consider scan and filter selectivity separately for skip scan    11.2.0.4
  16555865     1 QKSFM_INDEX_SS_16555865                  ignore filter selectivity on skip scan key columns               12.1.0.2
  20129763     1 QKSFM_INDEX_SS_20129763                  update number of skips for skip scan only when index keys remain 12.2.0.1
  20587527     1 QKSFM_INDEX_SS_20587527                  improve cost estimate for skip scan                              12.2.0.1
  20107874     1 QKSFM_INDEX_SS_20107874                  adjust number of skips by current skip scan selectivity          18.1.0
   4904838     1 QKSFM_CBO_4904838                        allow index skip scan with no index keys                         9.2.0.8

And testing if bug 20107874 is responsible for the change in skip scan costing gives a positive result, with the good index being picked up.

SQL> alter session set "_fix_control"=’20107874:off’;
—
———————————————————————————————
| Id  | Operation                           | Name                     | Rows  | Cost (%CPU)|
———————————————————————————————
|   0 | SELECT STATEMENT                    |                          |     1 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE      |     1 |     2   (0)|
|*  2 |   INDEX RANGE SCAN                  | AS_BIG_LOOKUP_TABLE_GOOD |     1 |     1   (0)|
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
   2 – access("TYPE_COL"=10 AND "JOIN_COL"=10)

Sometimes you don’t even need to look at the plans, and do this type of simple test straight after identifying that something is not right with some feature and/or release. Mauro Pagano has written a tool called Pathfinder which can do this for you automatically, brute forcing all possible optimizer environments.

Is there anything else to add? Yes. Two things:
1. This is a known issue and there’s a patch available on MOS.
2. As you can see, the test case doesn’t use dbms_stats call. That’s okay in 12c+ as table statistics is created with CTAS, and index stats is created on index creation by default since 10g. But let’s try to gather table stats just in case in 18c (after reverting fix control back to original setting):

exec dbms_stats.gather_table_stats(”, ‘AS_BIG_LOOKUP_TABLE’)
———————————————————————————————
| Id  | Operation                           | Name                     | Rows  | Cost (%CPU)|
———————————————————————————————
|   0 | SELECT STATEMENT                    |                          |     1 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE      |     1 |     2   (0)|
|*  2 |   INDEX RANGE SCAN                  | AS_BIG_LOOKUP_TABLE_GOOD |     1 |     1   (0)|
———————————————————————————————
Predicate Information (identified by operation id):
—————————————————
   2 – access("TYPE_COL"=10 AND "JOIN_COL"=10)

What?! Everything is great again, and skip scan is gone! This is really surprising to see. I tried to find if there is any difference in table/index stats after DBMS_STATS call, and couldn’t find any; all statistics were exactly the same. I don’t know how to explain this, and will be happy to hear an explanation.


Viewing all articles
Browse latest Browse all 14

Trending Articles