On 19th October I wrote about a discussion with a customer on the importance on Statistics after Merging of Data. The scenario that I mentioned was that the customer planned to merge data from database C to H and to differentiate this data, some of the critical (queried) columns of C will be incremented by 900000000. Since, many of these columns are Unique / Primary keys, there ar possibilities that a column can have a value 1111 on both the databases. Therefore, on C Database, before the merger, this value will be changed to 900001111. The max value for this column in H database would be less than 200000000.
With this change, it was suggested to test the application post-merger and with new set of statistics. An example below demonstrates the change in optimizer behaviour since 10.1.0.3.
We all know that the predicted number of rows, also called as CARDINALITY, plays a crucial role in optimizer's calculation of a Join Order or an Access Path. Any error in the calculation of Cardinality can have an impact on the overall execution plan of a query.
The steps performed are : (to make the numbers and calculation small, I will use a table with less number of rows and with smaller values).
1. I first create a table that depicts H Database.
2. Execute a query. We will check for the Cardinality.
3. Then we merge the data into H from C, with incremented values.
4. Without gathering statistics, we shall execute the same query (as in step 2).
5. Without gathering statistics, we shall execute the same query but with a newer value.
After Step 4 & 5, we can conclude that statistics are very much important, especially, when the values in the frequently queried columns go out of range as it can have an impact in cardinality calculation.
.
.
Step 1 - Create Table and Gather Stats (H Database)
.
create table vivek_header as
with main as
(select * from all_objects),
main1 as
(select rownum ag_id, main.object_name, main.created, main.temporary
from main, main main1
where rownum<=1000000)
select * from main1
order by dbms_random.random;
.
Table created
.
.
Elapsed: 00:00:02.76
.
.
.
SQL> alter table vivek_header add constraint pk_ag_id primary key(ag_id);
.
Table altered.
.
Elapsed: 00:00:02.67
.
select count(*) from vivek_header;
.
COUNT(*)
----------
1000000
.
Elapsed: 00:00:00.09
.
.
select min(ag_id), max(ag_id) from vivek_header;
MIN(AG_ID) MAX(AG_ID)
---------- ----------
1 1000000
.
Elapsed: 00:00:00.12
.
exec dbms_stats.gather_table_stats(user,'VIVEK_HEADER',cascade=>true);
.
select column_name, vivek_number(low_value) "Low", vivek_number(high_value) "High"
from user_tab_columns
where table_name='VIVEK_HEADER'
and column_name='AG_ID';
.
.
COLUMN_NAME Low High
------------------------------ ---------- ----------
AG_ID 78 999594
.
Elapsed: 00:00:00.09
.
Step 2- Execute a Query that is in the Range. Cardinality calculation is more or less accurate.
.
select * from vivek_header
where ag_id>=900000;
.
100001 rows selected.
.
Elapsed: 00:00:00.79
.
Execution Plan
----------------------------------------------------------
Plan hash value: 1918031072
.
----------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
----------------------------------------------------------------------------------
0 SELECT STATEMENT 99240 2229K 895 (5) 00:00:11
* 1 TABLE ACCESS FULL VIVEK_HEADER 99240 2229K 895 (5) 00:00:11
----------------------------------------------------------------------------------
.
Step 3- Merging of C Data into H with New (Incremented) Values
insert into vivek_header
with main as
(select * from all_objects),
main1 as
(select 4000000+rownum ag_id, main.object_name, main.created, main.temporary
from main, main main1
where rownum<=50000)
select * from main1
order by dbms_random.random;
.
SQL> select min(ag_id), max(ag_id) from vivek_header;
.
MIN(AG_ID) MAX(AG_ID)
---------- ----------
1 4050000
.
Step 4 - Execute the query to get data for ag_id>=900000. The Cardinality Calculation is Wrong
.
select * from vivek_header
where ag_id>=900000;
.
150001 rows selected.
.
Elapsed: 00:00:01.07
.
Execution Plan
----------------------------------------------------------
Plan hash value: 1918031072
----------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
----------------------------------------------------------------------------------
0 SELECT STATEMENT 99240 2229K 895 (5) 00:00:11
* 1 TABLE ACCESS FULL VIVEK_HEADER 99240 2229K 895 (5) 00:00:11
----------------------------------------------------------------------------------
.
.
Step 5 - Execute the same Query for a new value ag_id. Cardinality Calculation is Surprisingly very low and this have an impact.
A point to note in the plan is that assuming a single row will be fetched, optimizer uses an Index Scan (PK)
.
.
select * from vivek_header
where ag_id>=4000000;
.
SQL> select * from vivek_header
2 where ag_id>=4000000;
50000 rows selected.
.
Elapsed: 00:00:00.50
.
Execution Plan
----------------------------------------------------------
Plan hash value: 3247409102
.
--------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 23 4 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID VIVEK_HEADER 1 23 4 (0) 00:00:01
* 2 INDEX RANGE SCAN PK_AG_ID 1 3 (0) 00:00:01
--------------------------------------------------------------------------------------------
.
.
To sum up, optimizer statistics plays crucial role in cardinality calculation which is important to get thru a optimum execution plan, this includes, join order and proper choice of Indexes. Therefore, regular statistics are required on the objects with columns moving out of range, particularly, timebased, sequencebased columns. This change in 10g can surprise many dba's when they upgrade from 9i and 10g.

1 comments:
Vivek, Nice to see my project on your blog. Haven't met you during my visits to H in Mumbai! Will link to this post from my blog.
Best regards
Sridhar
Post a Comment