Oracle7 Server Utilities
Maximizing Performance of Direct Path Loads
You can control the time and temporary storage used during direct path loads.
To minimize time:
- Pre-allocate storage space.
- Perform infrequent data saves.
- Disable archiving of redo log files.
To minimize space:
- When sorting data before the load, sort data on the index that requires the most temporary storage space.
- Drop indexes and recreate them after the load.
Allocating I/O Buffers
When doing a direct path load, it is a advisable to specify a large number of buffers, if your operating system provides for that capacity. Buffers are allocated with the I/O processing options string, described .
On some systems, as many as 200 buffers are needed to keep the CPU busy. Otherwise, the CPU spends most of its time idling, waiting for I/O to complete. If you can measure CPU utilization, you will achieve maximum performance of direct loads when the number of buffers allow the CPU to operate at 95% to 98% utilization.
Additional Information: The procedure for allocating additional I/O buffers depends on your operating system. See your Oracle operating system-specific documentation for more information.
Pre-allocating Storage for Faster Loading
SQL*Loader automatically adds extents to the table if necessary, but this process takes time. For faster loads into a new table, allocate the required extents when the table is created.
To calculate the space required by a table, see Chapter 8 "Managing Database Files" in the Oracle7 Server Administrator's Guide. Then use the INITIAL or MINEXTENTS clause in the SQL command CREATE TABLE to allocate the required space.
Pre-sorting Data for Faster Indexing
You can improve the performance of direct path loads by pre-sorting your data on indexed columns. Pre-sorting minimizes temporary storage requirements during the load. Pre-sorting also allows you to take advantage of high-performance sorting routines that are optimized for your operating system or application.
If the data is pre-sorted and the existing index is not empty, then pre-sorting minimizes the amount of temporary segment space needed for the new keys. The sort routine appends each new key to the key list. Instead of requiring extra space for sorting, only space for the keys is needed. To calculate the amount of storage needed, use a sort factor of 1.0 instead of 1.3. For more information on estimating storage requirements, see "Temporary Segment Storage Requirements" .
If pre-sorting is specified and the existing index is empty, then maximum efficiency is achieved. The sort routines are completely bypassed, with the merge phase of index creation. The new keys are simply inserted into the index. Instead of having a temporary segment and new index existing simultaneously with the empty, old index, only the new index exists. So, temporary storage is not required, and time is saved.
SORTED INDEXES Statement
The SORTED INDEXES statement identifies the indexes on which the data is presorted. This statement is allowed only for direct path loads. See Chapter 5, "SQL*Loader Control File Reference," for the syntax. See Case 6 for an illustration.
Generally, you specify only one index in the SORTED INDEXES statement because data that is sorted for one index is not usually in the right order for another index. When the data is in the same order for multiple indexes, however, all of the indexes can be specified at once.
All indexes listed in the SORTED INDEXES statement must be created before you start the direct path load.
Unsorted Data
If you specify an index in the SORTED INDEXES statement, and the data is not sorted for that index, then the index is left in direct load state at the end of the load. The data is present, but any attempt to use the index results in an error. Any index which is left in direct load state must be dropped and re-created after the load.
Multiple Column Indexes
If you specify a multiple-column index in the SORTED INDEXES statement, the data should be sorted so that it is ordered first on the first column in the index, next on the second column in the index, and so on.
For example, if the first column of the index is city, and the second column is last name; then the data should be ordered by name within each city, as in the following list:
Albuquerque Adams
Albuquerque Hartstein
Albuquerque Klein
... ...
Boston Andrews
Boston Bobrowski
Boston Heigham
... ...
Choosing the Best Sort Order
For the best overall performance of direct path loads, you should presort the data based on the index that requires the most temporary segment space. For example, if the primary key is one numeric column, and the secondary key consists of three text columns, then you can minimize both sort time and storage requirements by pre-sorting on the secondary key.
To determine the index that requires the most storage space, use the following procedure:
1. For each index, add up the widths of all columns in that index.
2. For a single-table load, pick the index with the largest overall width.
3. For each table in a multiple table load, identify the index with the largest, overall width for each table. If the same number of rows are to be loaded into each table, then again pick the index with the largest overall width. Usually, the same number of rows are loaded into each table.
4. If a different number of rows are to be loaded into the indexed tables in a multiple table load, then multiply the width of each index identified in step 3 by the number of rows that are to be loaded into that index. Multiply the number of rows to be loaded into each index by the width of that index and pick the index with the largest result.
Infrequent Data Saves
Frequent data saves resulting from a small ROWS value adversely affect the performance of a direct path load. Because direct path loads can be many times faster than conventional loads, the value of ROWS should be considerably higher for a direct load than it would be for a conventional load.
During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety. It is a good idea to determine the average time to load a row by loading a few thousand rows. Then you can use that value to select a good value for ROWS.
For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 10 minutes of work after an interruption, then set ROWS to be 200,000 (20,000 rows/minute * 10 minutes).
Minimizing Use of the Redo Log
One way to speed a direct load dramatically is to minimize use of the redo log. There are two ways to do this. You can disable archiving, or you can specify that the load is UNRECOVERABLE. This section discusses both methods.
Specifying UNRECOVERABLE
Use UNRECOVERABLE to save time and space in the redo log file. An UNRECOVERABLE load does not record loaded data in the redo log file.
Therefore, media recovery is disabled for the loaded table, although database changes by other users may continue to be logged.
Note: Because the data load is not logged, you may want to make a backup of the data after loading.
If media recovery becomes necessary on data that was loaded with the UNRECOVERABLE phrase, the data blocks that were loaded are marked as logically corrupted.
To recover the data, drop and re-create the data. It is a good idea to do backups immediately after the load to preserve the otherwise unrecoverable data.
By default, a direct path load is RECOVERABLE. See "Data Definition Language Syntax" for information on RECOVERABLE and UNRECOVERABLE.