Apr 5, 2010

splitting big tables for parallel operations

What would you do, if you need to make parallel application-level operation on a big table? Using indexed column is not always an option, and it could also be slower if number of chunks for parallel operations are in order of 10 or so (index probably will not be used or used inefficiently).

It is native to Oracle to use ROWIDs for such an operations. Following stored procedure will split table's all extents to approximately equal parts.


CREATE OR REPLACE PROCEDURE get_table_chunk_rowids2 (p_schema VARCHAR2, p_table VARCHAR2,
p_chunks NUMBER, p_cur_chunk NUMBER,
p_min_rowid OUT VARCHAR2, p_max_rowid OUT VARCHAR2
)
AS
BEGIN
SELECT q.r1, q.r2
INTO p_min_rowid, p_max_rowid
FROM
(
SELECT rownum rn
, sys.DBMS_ROWID.rowid_create (1, d.oid, c.fid1, c.bid1, 0) r1
, sys.DBMS_ROWID.rowid_create (1, d.oid, c.fid2, c.bid2, 9999) r2
FROM (SELECT DISTINCT
b.rn,
FIRST_VALUE (a.fid)
OVER ( PARTITION BY b.rn
ORDER BY a.fid, a.bid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fid1,
LAST_VALUE (a.fid)
OVER ( PARTITION BY b.rn
ORDER BY a.fid, a.bid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fid2,
FIRST_VALUE (
DECODE (SIGN (range2 - range1),
1, a.bid + ( (b.rn - a.range1) * a.chunks1),
a.bid) )
OVER (
PARTITION BY b.rn
ORDER BY a.fid, a.bid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) bid1,
LAST_VALUE (
DECODE (
SIGN (range2 - range1),
1, a.bid + ( (b.rn - a.range1 + 1) * a.chunks1) - 1,
(a.bid + a.blocks - 1)))
OVER (
PARTITION BY b.rn
ORDER BY a.fid, a.bid
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) bid2
FROM (SELECT fid,
bid,
blocks,
chunks1,
TRUNC ( (sum2 - blocks + 1 - 0.1) / chunks1) range1,
TRUNC ( (sum2 - 0.1) / chunks1) range2
FROM (SELECT /*+ rule */
relative_fno fid,
block_id bid,
blocks,
SUM (blocks) OVER () sum1,
TRUNC ( (SUM (blocks) OVER ()) / p_chunks) chunks1,
SUM (blocks) OVER (ORDER BY relative_fno, block_id) sum2
FROM dba_extents
WHERE segment_name = UPPER (p_table) AND owner = UPPER(p_schema))
WHERE sum1 > p_chunks) a,
( SELECT ROWNUM - 1 rn
FROM DUAL
CONNECT BY LEVEL <= p_chunks) b
WHERE b.rn BETWEEN a.range1 AND a.range2) c,
(SELECT MAX (data_object_id) oid
FROM dba_objects
WHERE object_name = UPPER (p_table) AND owner = UPPER (p_schema)
AND data_object_id IS NOT NULL) d
ORDER BY d.oid, c.fid1, c.bid1
) q
WHERE q.rn = p_cur_chunk;
END;


Btw, if you are on the Oracle 11g R2, then you could consider using DBMS_PARALLEL_EXECUTE - http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php. E.g. it has DBMS_PARALLEL_EXECUTE.get_rowid_chunk procedure. HTH

No comments: