Blog Archive

I have consolidated, and will do so in future, all my Experiences for Oracle Community. All these are my own opinions and does not necessarily reflect the views of Oracle. There are many databases with different permutation and combination which I have not seen. Hence, with these limitations, my views might be wrong, before implementing do test these. Please feel free to comment on my views, be it a Criticism, as this will only improve my writing abilities.

Wednesday, 5 March 2008

Rowid Based Parallelism (Manual) v/s Oracle Parallesim....

Recently, I had come across an interesting challenge of purging data from a 500gb table. This was not a plain PURGE but was based on a join and filtering on two other tables. The reason I term it as a challenge is because the database was on a low end server with 12 CPU and 52 GB RAM.The Original Query used for purging is as under :


create table wia_values_new_1 nologging
parallel 24 tablespace A_TS_IDX as
select a.*
from wia_values a,
(select a.it_type ,a.it_key ,b.name
from wi a, wia b
where a.begin_date > '01-NOV-2007'
and a.it_type=b.it_type) vv
where a.it_type=vv.it_type
and a.it_key=vv.it_key
and a.name=vv.name

To accomplish this task, it was obvious to run the queries with parallel processes. On a 12 CPU Machine, the Nested Loop Join took more than 10-12 hours and had to be aborted as this much time was unacceptable. With a HASH Join, the query used to fail with UNABLE TO Extend Temporary Segments after 3-4 hours of execution. This means, restarting of the entire process.

Oracle 9i onwards, one can use dbms_resumable to take care of these errors. Once fixed, the process will continue from the point where it failed.

Finally, I recollected a very good methodology, called as ROWID Parallelism, introduced by Thomas Kyte. I read this long back in his (and one of my favourite) book "Effective Oracle by Design", but have practically used and implemented this for the first time. Believe me, with this parallelism, I got a the tremendous performance benefit and could accomplish this challenging task in less than 2 hours with great ease.

Unfortunately, I could not use these on production because due to the challenge involved, the customer provided us with additional CPU's and Memory, and a NESTED Loop query did the purge in 3 hours. But, based on the comparision, these rowid parallelism could have achieved the task in less than an hour and half.

The steps that I performed were as under :

  1. Split the table into multiple chunks (based on the number of rowid parallelism) that would be used. In my case, the extents of 500 GB table were scattered across 208 datafiles, hence, I splitted the table into 200 chunks.
  2. Create a Job_Table, to store the low and high rowids of each chunk. This table will store some additional data that can be used later.
  3. Spawn and Schedule parallel jobs using dbms_job. Each job will process its own chunk. 4. Merge the data into one single table.

The advantage of using this parallelism is :

  1. Easier to split the table into multiple chunks.
  2. Easier to manage, based on the CPU Utilization.
  3. If anyone of the script or job fails, it is easier to restart that single job rather than restarting the entire process all again.

These are the queries that were used to accomplish this purging task using ROWID Parallelism.






Job_Table
---------
create table job_table
(
job_id int primary key,
lo_rid rowid,
hi_rid rowid,
completed_yn varchar2(1),
rows_processed number(20),
start_time date,
end_time date);

----------------------------------------------------------------------

Spawning and storing of multiple chunks into job_table
------------------------------------------------------
insert into job_table(job_id, lo_rid, hi_rid)
select grp,
dbms_rowid.rowid_create(1,data_object_id, lo_fno, lo_block,0) min_rid,
dbms_rowid.rowid_create(1,data_object_id, hi_fno, hi_block,0) max_rid
from (
select distinct grp,
first_value(relative_fno) over
(partition by grp order by relative_fno, block_id rows
between unbounded preceding and unbounded following) lo_fno,
first_value(block_id) over
(partition by grp order by relative_fno, block_id rows
between unbounded preceding and unbounded following) lo_block,
last_value(relative_fno) over
(partition by grp order by relative_fno, block_id rows
between unbounded preceding and unbounded following) hi_fno,
last_value(block_id) over
(partition by grp order by relative_fno, block_id rows
between unbounded preceding and unbounded following) hi_block,
sum(blocks) over (partition by grp) sum_blocks from (
select relative_fno, block_id, blocks,
sum(blocks) over (order by relative_fno, block_id) cum_blocks,
sum(blocks) over () tot_blocks,
trunc((sum(blocks) over (order by relative_fno, block_id)-0.01) /
(sum(blocks) over ()/200)) grp
from dba_extents
where segment_name='WIA_VALUES'
and owner=user
order by relative_fno, block_id)),
(select data_object_id from dba_objects
where owner=user
and object_name='WIA_VALUES');
commit;

--------------------------------------------------------------------------

** Now I have 200 rows in this table with low and high rowids that
** take care of the entire table. This is the sample output.....

GRP MIN_RID MAX_RID
---------- ------------------ ------------------
0 AABDksAALAAAAAJAAA AABDksAB2AAB9f5AAA
1 AABDksAB2AAB9gZAAA AABDksACkAACSFJAAA
2 AABDksACkAACSFZAAA AABDksADSAAAsxJAAA
3 AABDksADSAAAsx5AAA AABDksADzAACT6pAAA
4 AABDksADzAACT7JAAA AABDksAEkAAAADJAAA
5 AABDksAEkAAAADpAAA AABDksAFMAAAKFZAAA
6 AABDksAFMAAAKFpAAA AABDksAGmAAAYVJAAA
7 AABDksAGmAAAYV5AAA AABDksAHTAAAnsJAAA
8 AABDksAHTAAAnsZAAA AABDksAH6AACS0pAAA
9 AABDksAH6AACS1JAAA AABDksAIeAABo65AAA
10 AABDksAIeAABo+5AAA AABDksAJRAAAwmZAAA
11 AABDksAJRAAAwnZAAA AABDksAJ1AACDppAAA

--------------------------------------------------------------------------

** I created 200 tables in which, a parallel job will insert required
** rows into.

declare
l_statement varchar2(200);
begin
for i in 0..199
loop
l_statement:='create table wia_new_'i' tablespace A_TS_IDX as
select * from wia_values where 1=2';
execute immediate l_statement;
end loop;
end;

--------------------------------------------------------------------------

** Then I created a Database Procedure that will be executed in parallel
** and the input value will be the job_id. Based on this input job_id,
** the insert script will pickup the low and high rowids and will do a rowid
** range scan the table. Based on the status (completion or failure), the
** job_table will be updated.

create or replace procedure p_job(p_job in number) as
l_job_table job_table%rowtype;
l_completed varchar2(1);
l_statement varchar2(2000);
l_rows number(20):=0;
l_date date;
begin
select * into l_job_table from job_table
where job_id=p_job
and nvl(completed_yn,'N')!='P'
and (completed_yn!='Y' or completed_yn is null);
l_rows:=sql%rowcount;
if l_rows>0 then
execute immediate 'alter session set db_file_multiblock_read_count=128';
select sysdate into l_date from dual;
update job_table set completed_yn='P', start_time=l_date where job_id=p_job;
commit;
l_statement:='insert /*+ append */ into wia_new_'p_job' nologging
select a.*
from wia_values a,
(select a.it_type ,a.it_key ,b.name
from wi a,wia b
where a.begin_date > ''''''01-NOV-2007''''''
and a.it_type=b.it_type) vv
where a.it_type=vv.it_type
and a.it_key=vv.it_key
and a.name=vv.name
and a.rowid between'''''l_job_table.lo_rid''''' and

'''''l_job_table.hi_rid'''';
begin
execute immediate l_statement;
l_rows:=sql%rowcount;
select sysdate into l_date from dual;
update job_table
set completed_yn='Y',
rows_processed=l_rows,
end_time=l_date
where job_id=p_job;
commit;
exception when others then
update job_table
set completed_yn='F',
start_time=null
where job_id=p_job;
commit;
end;
end if;
end;

--------------------------------------------------------------------------

** DBMS_JOB will be used to schedule jobs at regular interval and based
** on the CPU Utilization.

declare
l_jobno number;
BEGIN
for i in 0..30
loop
DBMS_JOB.SUBMIT(l_jobno,
'p_job('i');',
trunc(sysdate)+16/24+00/(24*60));
end loop;
COMMIT;
END;

--------------------------------------------------------------------------

** Based on the Utilization, I spawned 30 parallel jobs. Each Job took 5-
** 7 minutes to complete a rowid range scan of wia_values. This I observed
** from v$session_longops. Once the rowid range scan was complete, hash join
** and insertion took another 10 minutes. Hence, one job took almost 20-25
** minutes. Hence, I scheduled another set of 30 jobs at 10 minute interval.
** The overall start time and end time, as an example, I followed was

10:00 - 10:25 30 Jobs
10:10 - 10:35 30 Jobs
10:35 - 11:00 30 Jobs (After 100% Completion of Previous 60 Jobs)
10:45 - 11:10 30 Jobs
11:10 - 11:35 30 Jobs (After 100% Completion of Previous 60 Jobs)
11:20 - 11:45 30 Jobs
11:45 - 12:10 20 Jobs (Remaining)

--------------------------------------------------------------------------

** A Sample output after completion of all the scripts...

JOB_ID LO_RID HI_RID C ROWS_PROCESSED START_TIME END_TIME
---------- ------------------ ------------------ - -------------- ------------------- -------------------
0 AABDksAALAAAAAJAAA AABDksAB2AAB9f5AAA Y 29429239 02-03-2008 16:00:06 02-03-2008 16:19:26
1 AABDksAB2AAB9gZAAA AABDksACkAACSFJAAA Y 208119 02-03-2008 16:00:06 02-03-2008 16:15:51
2 AABDksACkAACSFZAAA AABDksADSAAAsxJAAA Y 199466 02-03-2008 16:00:06 02-03-2008 16:13:23
3 AABDksADSAAAsx5AAA AABDksADzAACT6pAAA Y 10776162 02-03-2008 16:00:06 02-03-2008 16:18:48
4 AABDksADzAACT7JAAA AABDksAEkAAAADJAAA Y 18897287 02-03-2008 16:00:06 02-03-2008 16:19:03




Conclusion : Oracle Parallelism uses same rowid parallelism mechanism to accomplish a task, but manual parallelism works well when you are short of
resources, like, in my case, CPU's and Temporary Space. Though, these require some manual work, it makes a task easier and faster.

7 comments:

Jan said...

...
...
select sysdate into l_date from dual;
...
...

Author of the metalink note: "Contention on SYS.DUAL table"

Anonymous said...

...
...
select sysdate into l_date from dual;
...
...

Author of the metalink note: "Contention on SYS.DUAL table"

Vivek Sharma said...

hmmm...:)

jan, here optimization of dual was not of that priority as dual table is used only 2 times in a procedure. DUAL table can cause performance issues when concurrent sessions query this.

Regards
Vivek

Jan said...

Yes, I know it has no performance penalty in this case.

It is just about typing 4 more words than l_date:=SYSDATE and doing an adverts for bad habit.

Multifuncional said...

Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the Impressora e Multifuncional, I hope you enjoy. The address is http://impressora-multifuncional.blogspot.com. A hug.

Kelly said...

The initial split code might be wrong, I used similar and it produced incorrect ROWIDS. Compare to this one which is similar and worked for me: http://pages.videotron.com/orautils/pages/split_table_by_extent_range.htm

Look for last_value(block_id+blocks-1)

Ilya said...

Hi
I also was playing with very similar,including split to chunks and dbms_job.submit. Even splitting partitions to chunks by ROWID range. Also ispired by Tom. Worked very good in 9i. There is a known ( performance )problem with dba_extents in 10g. How did you override / handle this ?

About Me

My Photo
Vivek Sharma
Mumbai, Maharashtra, India
I am an Oracle Professional, mainly into Application and Database Tuning. I have been associated with Oracle technologies for more than 8 years and, in these 8 years, have seen Oracle Database growing at a fast pace. This is one of the primary reason to be associated with this technology. I initiated my association with Oracle as a FORMS Developer and then transitioned to a DBA. Development experience helped me a lot in performing my DBA responsibilities, especially, when it comes to Database and Application tuning. I am a technical speaker with a passion to share and learn from Oracle Community across the world.
View my complete profile

Number of Hits on this Page (Counter Initiated since 4th Nov 2007)

Visit online slots or online blackjack at casino games site.

Visitor Locations - Across Globe

Some of my Favourite Books

  • Oracle 10g Real Application Clusters Handbook by K. Gopalakrishnan
  • Oracle Wait Interface by K. Gopalakrishnan
  • Effective Oracle by Design, Author : Thomas Kyte