Wednesday, 9 August 2017

Dynamic Statistics in Oracle Database 12c Release 1

Oracle Database 10g introduced Dynamic Sampling to allow the optimizer to gather additional information at parse time if database statistics were missing, stale or insufficient to produce a good execution plan. Dynamic sampling was controlled by the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter at instance or session level, or for individual queries using the DYNAMIC_SAMPLING hint, with available values ranging between "0" (off) to "10" (aggressive sampling) with a default value of "2".

Dynamic sampling has been renamed to Dynamic Statistics in Oracle Database 12c. Much of the functionality is the same, but a new sample level of 11 has been added. The new setting allows the optimizer to decide if dynamic statistics should be sampled and if so, what sample level to use. Both the name change in the documentation and the additional sampling level have been back-ported to Oracle 11gR2 ( subsequently, so with the exception of the references to SQL plan directives, the functionality described here is common to 11gR2 ( also.

  • Why Use Dynamic Statistics

Dynamic statistics can be beneficial in the following cases.
  1. The sample time is small compared to the overall query execution time.
  2. The current database statistics alone would not create an optimal plan, so dynamic sampling results in a better performing query.
  3. The query may be executed multiple times, so a small delay in the initial parse phase will result in considerable savings overall.

  • When to Sample

The optimizer will attempt to use default database statistics in preference to dynamic statistics, but the following situations will trigger automatic sampling to gather dynamic statistics.
  1. Missing Statistics : Dynamic statistics are sampled if there are missing database statistics. They may be missing because they are newly created objects, or had their statistics locked before any statistics were gathered. Although the dynamic statistics should help the optimizer, these statistics are considered low quality compared to conventional database statistics.
  2. Stale Statistics : Statistics are considered stale when 10% or more of the rows in the table have changed since the statistics were last gathered. Stale statistic can affect cardinality estimates because of changes to the number of rows in the table and inaccuracies in column statistics, such as number of distinct values, high and low column values.
  3. Insufficient Statistics : Existing database statistics may not be sufficient to generate an optimal execution plan. In the short term, dynamic statistics can make up for the absence of extended statistics for column groups and expressions, as well as missing histograms that would identify data skew. Even when all the necessary statistics are present it may not be possible to correctly estimate cardinalities for some complex predicates, operations or joins, so dynamic sampling may still be necessary.
  4. Parallel Execution : Parallel execution is typically used to speed up long running processes. For a long running process, the time associated with sampling dynamic statistics is trivial compared to the query execution time, so it may be worth spending a little more time to make sure the execution plan is optimal.
  5. SQL Plan Directives : The presence of one or more usable SQL plan directives will trigger the sampling of dynamic statistics. SQL plan directives are created when the optimizer identifies misestimates in the cardinality of operations or degree of parallelism (DOP) from previous executions of the statement, or other statements using similar query expressions.
  6. In these cases, recursive SQL is used to sample the data and generate dynamic statistics, which are persisted and sharable between SQL statements having similar patterns.

  • Controlling Dynamic Statistics

As mentioned previously, dynamic statistics can be controlled by the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter and the DYNAMIC_SAMPLING hint. The functionality associated with the individual settings is described here. The following examples show how to control dynamic statistics at system, session and statement level.

-- System level. Don't do this!

-- Session level.

-- Statement level.
SELECT /*+ dynamic_sampling(emp 11) */
       empno, ename, job, sal
FROM   emp
WHERE  deptno = 30;

In the majority of cases you should not need to change the default value of "2". This is possibly more true in Oracle 12c because of the introduction of SQL plan directives and how they work. If the optimizer identifies misestimates in the cardinality of operations or degree of parallelism (DOP), it can create SQL plan directives to force dynamic sampling in the short term. The presence of SQL plan directives influence the way DBMS_STATS gathers statistics, which potentially fixes the root cause of the problems in the database statistics, making the SQL plan directives and therefore dynamic sampling no longer necessary.

  • Reuse of Dynamic Statistics

Dynamic Statistics can be stored in the SGA as OPT_ESTIMATE hints in the V$SQL_REOPTIMIZATION_HINTS view. Both the view and the OPT_ESTIMATE hint are undocumented. The storage of dynamic statistics mean resampling of the statistics is not necessary if the same statement is parsed again.