Serialised Bulk Operations
This is a demonstration of how to serialise bulk operations.
Problem:
A bulk operation takes data from a table and inserts or updates another, i.e. bulk insert or bulk update. The source table can, itself, receive updates or inserts from other processes so, in order to know what has been processed and what has not, a flag column is added to the source. This flag takes the values 'Y' (processed) and 'N' (not processed). The bulk operation identifies rows set to 'N', processes them and then flips the flag to 'Y' to indicated that it is done.
It is possible that, while the bulk process is running, other processes change or add data to the source. Should the bulk process be run by another session while one is already running, there is a chance that transactional integrity breaks.
Also, the source can contain a lot of data so performance needs to be considered.
Solution:
This solution uses SELECT ... FOR UPDATE to ensure transactional integrity, and BULK COLLECT ... LIMIT and FORALL for performance.
Example Procedure:
- This is presented as an anonymous block but can be implemented in a package procedure.
- Create an INDEX BY table type based on the primary key of the source table.
- Instantate an INDEX BY table based on the type.
- Create a local variable to hold the LIMIT (i.e. the number of rows processed in each run of the loop). This would be provisionally set to 1000 and later changed in the light of any tuning information.
- Create a cursor to fetch the primary keys of the source into the INDEX BY table where the flag is 'N' using FOR UPDATE. This locks the rows to be processed.
- Open the cursor. This applies the lock.
- Loop. Start of cursor loop - this is required to loop on the LIMIT clause in the FETCH.
- Fetch the cursor BULK COLLECT into the INDEX BY table with the LIMIT clause.
- *** Do the bulk process *** - At this point the primary bulk process is executed. The target table is inserted into or updated based on the set of primary keys in the INDEX BY table.
- Update the source flags. Use FORALL to update the flags based on the set of primary keys in the INDEX BY table.
- Exit the loop when there are no more rows fetched by the cursor.
- End; Loop. Start of cursor loop...
- Commit work.
- Close the cursor.
-- Anonymous block: |
Testing:
There are two test scripts to be run in different sessions - these simulate different processes operating on the same object at the same time. The first script contains a delay statement. This 5 second period is to give enough time to switch sessions and start the second script:
test_script_a.sql
DROP TABLE test |
test_script_b.sql
SET SERVEROUTPUT ON SIZE 1000000 |