[12C] take you to understand the collection of extended statistics of Oracle 12C database extension

Time:2021-1-23

In Oracle database, the optimizer needs to collect the statistical information of related objects in order to produce the optimal execution plan. The database can collect the execution plan automatically or use the package DBMS_ Stats for manual collection, this article does not explain the collection of conventional statistical information, but focuses on the collection of extended statistics.

1 introduction of extended statistical information

When executing SQL, the statistical information of a single column after the where condition can easily determine the selectivity of the predicate of the corresponding column. However, if multiple columns of the same table are included as predicates after the where condition, the statistical information of a single column can not show the relationship between the columns. At this time, the execution plan obtained by the optimizer based on the statistical information of a single column may not be optimal It’s too late. This problem can be solved by column group. By collecting the statistical information of column group, the relationship of a group of columns can be reflected, so that the optimizer can choose the optimal execution plan. The statistical information of column group is called extended statistical information. In Oracle, the extended statistics include:

  • Column group statistics: when multiple columns of a table appear in one SQL statement at the same time, this type of extended statistics can improve cardinality estimation
  • Expression statistics: this type of statistics improves optimizer evaluation when using expressions on predicates.

As shown in the figure sh.customers Two columns cust in the table_ state_ Province and contry_ The statistical information of the ID column and the statistical information of the column group composed of the two columns. The name of the column group is generated by the system.

 

2 use general statistics

1) Experimental environment

SQL> select * from v$version;



BANNER CON_ID

-------------------------------------------------------------------------------- ----------

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0

PL/SQL Release 12.1.0.2.0 - Production 0

CORE 12.1.0.2.0 Production 0

TNS for Linux: Version 12.1.0.2.0 - Production 0

NLSRTL Version 12.1.0.2.0 - Production 0

2) Build test data

SQL> create table sh.customers_new as select * from sh.customers;



Table created.

SQL> exec dbms_stats.gather_table_stats('SH','CUSTOMERS_NEW',method_opt => 'for all columns size 1');



PL/SQL procedure successfully completed.

3) Query column statistics

SQL> select column_name,num_distinct,histogram

from dba_tab_col_statistics

where owner='SH' and table_name='CUSTOMERS_NEW' and column_name in('CUST_STATE_PROVINCE','COUNTRY_ID');



COLUMN_NAME NUM_DISTINCT HISTOGRAM

------------------------------ ------------ ---------------

CUST_STATE_PROVINCE 145 NONE

COUNTRY_ID 19 NONE

3) The sum of the total number of query tables satisfies cust_ state_ The amount of data in the province ‘Ca’

SQL> select count(1) from sh.customers_new;



COUNT(1)

----------

55500



SQL> select count(1) from sh.customers_new where cust_state_province='CA';



COUNT(1)

----------

3341

4) View the execution plan corresponding to a single column

SQL> explain plan for

2 select * from sh.customers_new where cust_state_province='CA';



Explained.

SQL> select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3410015392



-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 383 | 72387 | 423 (1)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 383 | 72387 | 423 (1)| 00:00:01 |

-----------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------



1 - filter("CUST_STATE_PROVINCE"='CA')



13 rows selected.

Look at the following calculation to match the number of rows estimated by the optimizer:

SQL> select 55500/145 from dual;

 

55500/145

———-

382.758621

5) View the execution plan corresponding to the combination condition

SQL> explain plan for

2 select * from sh.customers_new where cust_state_province='CA' and country_id='52790';



Explained.



SQL> select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3410015392



-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 20 | 3780 | 423 (1)| 00:00:01 |

|* 1 | TABLE ACCESS FULL| CUSTOMERS_NEW | 20 | 3780 | 423 (1)| 00:00:01 |

-----------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------



1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)



13 rows selected.

The result shows that the number of rows estimated by the optimizer is 20, which is the same as the following calculation, but the actual number of rows is 3341. Why is this?

SQL> select 55500/145/19 from dual;

 

55500/145/19

————

20.1451906

First of all, the optimizer does not know the relationship between the two columns and Ca in country 52790, which greatly underestimates the returned data rows;

3 use column group statistics

Through the above experiment, we can know that if there is a certain relationship between the columns, and when the where condition appears the combination condition, the optimizer can not accurately evaluate the number of returned rows, then in this case, we can collect the statistical information of the combination column.

1) Collect statistics of combined columns

begin
  dbms_stats.gather_table_stats('SH',
                                'CUSTOMERS_NEW',
                                method_opt => 'for all columns size 1,for columns (cust_state_province,country_id) size skewonly');
end;
 /



PL/SQL procedure successfully completed.

2) View the execution plan corresponding to the combination condition

explain plan for
  select * from sh.customers_new where cust_state_province='CA' and country_id='52790';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3410015392

-----------------------------------------------------------------------------------
| Id  | Operation	  | Name	  | Rows  | Bytes | Cost (%CPU)| Time	  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |		  |  3341 |   655K|   423   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS_NEW |  3341 |   655K|   423   (1)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------

   1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)

13 rows selected.

It can be seen that the estimated number of rows is 3341, while the actual number of rows is 3341, which is the same as the real result.

SQL> select count(1) from sh.customers_new where cust_state_province=’CA’ and country_id=’52790′;

 

COUNT(1)

———-

3341

3) Delete extended statistics

begin

dbms_stats.drop_extended_stats('sh',

'CUSTOMERS_NEW',

'(cust_state_province, country_id)');

end;

/



PL/SQL procedure successfully completed.

The above is the demonstration of column group statistics in extended statistics.