Oracle7 Server Utilities
Using Direct Path Load
This section explains you how to use SQL*Loader's direct path load.
Setting Up for Direct Path Loads
To prepare the database for direct path loads, you must run the setup script, CATLDR.SQL to create the necessary views. You need only run this script once for each database you plan to do direct loads to. This script can be run during database installation if you know then that you will be doing direct loads.
Specifying a Direct Path Load
To start SQL*Loader in direct load mode, the parameter DIRECT must be set to TRUE on the command line or in the parameter file, if used, in the format:
DIRECT=TRUE
See Case 6 for an example.
Building Indexes
During a direct path load, performance is improved by using temporary storage. After the data is loaded into the table, the new keys are copied to a temporary segment and sorted. The old index and the new keys are then merged to create the new index. The old index, temporary segment, and new index all require storage until the merge is complete. Then the old index and temporary segment are removed.
Note that, during a conventional path load, every time a row is inserted the index is updated. This method does not require temporary storage space, but it does add processing time.
The SINGLEROW Option
Performance on systems with limited memory can also be improved by using the SINGLEROW option. For more information see .
Note: If, during a direct load, you have specified that the data is to be pre-sorted and the existing index is empty, a temporary segment is not required, and no merge occurs--the keys are put directly into the index. See "Maximizing Performance of Direct Path Loads" for more information.
When multiple indexes are built, the temporary segments corresponding to each index exist simultaneously, in addition to the old indexes. The new keys are then merged with the old indexes, one index at a time. As each new index is created, the old index and the corresponding temporary segment are removed.
Index Storage Requirements
The formula for calculating the amount of space needed for storing the index itself can be found in Chapter 8 "Managing Database Files" of the Oracle7 Server Administrator's Guide. Remember that two indexes exist until the load is complete: the old index and the new index.
Temporary Segment Storage Requirements
The amount of temporary segment space needed for storing the new index keys (in bytes) can be estimated using the following formula:
1.3 * key_storage
where:
key_storage = (number_of_rows) *
( 10 + sum_of_column_sizes + number_of_columns )
The columns included in this formula are the columns in the index. There is one length byte per column, and 10 bytes per row are used for a ROWID and additional overhead.
The constant 1.3 reflects the average amount of extra space needed for sorting. This value is appropriate for most randomly ordered data. If the data arrives in exactly opposite order, twice the key-storage space is required for sorting, and the value of this constant would be 2.0. That is the worst case.
If the data is fully sorted, only enough space to store the index entries is required, and the value of this constant reduces to 1.0. See "Pre-sorting Data for Faster Indexing" for more information.
Indexes Left in Direct Load State
SQL*Loader may leave indexes in direct load state if a direct path load does not complete successfully.
Any SQL statement that tries to use an index that is in direct load state returns an error. The following conditions cause the direct path option to leave an index in direct load state:
- SQL*Loader runs out of space for the index.
- The data is not in the order specified by the SORTED INDEXES clause.
- There is an instance failure while building the index.
- There are duplicate keys in a unique index.
To determine if an index is in direct load state, you can execute a simple query:
SELECT INDEX_NAME, STATUS
FROM USER_INDEXES
WHERE TABLE_NAME = 'tablename';
If you are not the owner of the table, then search ALL_INDEXES or DBA_INDEXES instead of USER_INDEXES.
Data Saves
You can use data saves to protect against loss of data due to instance or media failure. All data loaded up to the last data save is protected against instance failure To continue the load after an instance failure, determine how many rows from the input file were processed before the failure, then use the SKIP option to skip those processed rows. If there were any indexes on the table, drop them before continuing the load, then recreate them after the load. See "Recovery" for more information on media and instance failure.
Note: Indexes are not protected by a data save, because SQL*Loader usually does not build indexes until after data loading completes. (The only time indexes are built during the load is when pre-sorted data is loaded into an empty table -- but these indexes are also unprotected.)
Using the ROWS Parameter
The parameter ROWS determines when data saves occur during a direct path load. The value you specify for ROWS is the number of rows you want SQL*Loader to read from the input file before saving inserts in the database.
The number of rows you specify for a data save is an approximate number. Direct loads always act on full data buffers that match the format of Oracle database blocks. So, the actual number of data rows saved is rounded up to a multiple of the number of rows in a database block.
SQL*Loader always reads the number of rows needed to fill a database block. Discarded and rejected records are then removed, and the remaining records are inserted into the database. So the actual number of rows inserted before a save is the value you specify, rounded up to the number of rows in a database block, minus the number of discarded and rejected records.
Data Save Versus Commit
In a conventional load, ROWS is the number of rows to read before a commit. A direct load data save is similar to a conventional load commit, but it is not identical. The similarities are:
- Data save will make the rows visible to other users
- Rows cannot be rolled back after a data save
The major difference is that the indexes will be unusable (in DIRECT load state) until the load completes.
Recovery
SQL *Loader provides full support for data recovery when using the direct path option. There are two main types of recovery:
Media Recovery | Media recovery is recovering from the loss of a database file. You must operate in ARCHIVELOG mode to recover after a file has been lost. |
Instance Recovery | Instance recovery is recovering from a system failure in which in-memory data was changed (but not written to disk) before the failure occurred. Oracle can always recover from instance failures, even if redo log files are not archived. |
See the Oracle7 Server Administrator's Guide for more information about recovery.
Instance Recovery and Direct Path Loads
Because SQL*Loader writes directly to the database files, all rows inserted up to the last data save will automatically be present in the database files if the instance is restarted. Changes do not need to be recorded in the redo log file to make instance recovery possible.
If an instance failure occurs, the indexes being built may be left in direct load state. Drop and re-create any affected indexes before using the table or continuing the load. See "Indexes Left in Direct Load State" for more information on how to determine if an index has been left in direct load state.
Media Recovery and Direct Path Loads
If redo log file archiving is enabled (you are operating in ARCHIVELOG mode), SQL*Loader logs loaded data when using the direct path, making media recovery possible. If redo log archiving is not enabled (you are operating in NOARCHIVELOG mode), then media recovery is not possible.
To recover a database file that was lost while it was being loaded, use the same method that you use to recover data loaded with the conventional path:
1. Restore the most recent backup of the affected database file.
Loading LONG Data Fields
Data that is longer than SQL*Loader's maximum buffer size can be loaded on the direct path with either the PIECED option or by specifying the number of READBUFFERS. This section describes those two options.
Loading Data as PIECED
The data can be loaded in sections with the pieced option if it is the last column of the logical record. The syntax for this specification is given .
Declaring a column as PIECED informs the direct path loader that the field may be processed in pieces, one buffer at once.
The following restrictions apply when declaring a column as PIECED:
- This option is only valid on the direct path.
- Only one field per table may be PIECED.
- The PIECED field must be the last field in the logical record.
- The PIECED field may not be used in any WHEN, NULLIF, or DEFAULTIF clauses.
- The PIECED field's region in the logical record must not overlap with any other field's region.
- The PIECED corresponding database column may not be part of the index.
- It may not be possible to load a rejected record from the bad file if it contains a PIECED field.
For example, a PIECED filed could span 3 records. SQL*Loader loads the piece from the first record and then reuses the buffer for the second buffer. After loading the second piece, the buffer is reused for the third record. If an error is then discovered, only the third record is placed in the bad file because the first two records no longer exist in the buffer. As a result, the record in the bad file would not be valid.
Using the READBUFFERS Keyword
For data that is not divided into separate sections, or not in the last column, READBUFFERS can be specified. With READBUFFERS a buffer transfer area can be allocated that is large enough to hold the entire logical record at one time.
READBUFFERS specifies the number of buffers to use during a direct path load. (A LONG can span multiple buffers.) The default value is four buffers. If the number of read buffers is too small, the following error results:
ORA-02374 ... No more slots for read buffer queue
Note: Do not specify a value for READBUFFERS unless it becomes necessary, as indicated by ORA-2374. Values of READBUFFERS that are larger than necessary do not enhance performance. Instead, higher values unnecessarily increase system overhead.