Oracle7 Server Utilities
Data Loading Methods
SQL*Loader provides two methods for loading data:
Direct path loads can be significantly faster than conventional path loads. Direct path loads achieve this performance gain by eliminating much of the Oracle database overhead by writing directly to the database files. The direct load, therefore, does not compete with other users for database resources so it can usually load data at nearly disk speed. Certain considerations, inherent to this method of access to database files, such as security and backup implications, are discussed in this chapter.
Conventional Path Loads
Conventional path loads (the default) use the SQL command INSERT and a bind array buffer to load data into database tables. This method is used by all Oracle tools and applications.
When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL commands are generated, passed to Oracle, and processed.
Oracle looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.
When to Use a Conventional Path Load
Because the direct path is many times faster than the conventional path, it is highly desirable to use the direct path. But there are times when the conventional path is preferred. You should use the conventional path in the following situations:
- When accessing an indexed table concurrently with the load, or when applying inserts or updates to a non-indexed table concurrently with the load.
To use the direct path (excepting parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read-write access to any indexes.
- When loading data with SQL*Net.
You cannot load data through the direct path with SQL*Net; unless both systems belong to the same family of computers, and both are using the same character set. Even then, load performance can be significantly impaired by network overhead.
- When loading data into a clustered table.
Clustered tables cannot be loaded through the direct path.
- When loading a relatively small number of rows into a large indexed table.
On the direct path, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by loading the data with the direct path.
- When loading a relatively small number of rows into a large table with referential and column-check integrity constraints.
Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.
- When you want to apply SQL functions to data fields.
SQL functions are not available on the direct path. For more information on the SQL functions, see "Applying SQL Operators to Fields" .
Direct Path Loads
Direct path loads are optimized for maximum data loading capability. Like the conventional path method, SQL*Loader's direct path method provides full support for media recovery.
Instead of filling a bind array buffer and passing it to Oracle with a SQL INSERT command, the direct path option creates data blocks that are already in Oracle database block format. These database blocks are then written directly to the database.
Internally, multiple buffers are used for the formatted data. While one buffer is being filled, multiple buffers are being written if asynchronous I/O is available on the host platform. This parallelism increases load performance.
Figure 8 - 1 shows how conventional and direct path loads perform database writes.
Figure 8 - 1. Database Writes on Direct Path and Conventional Path
Although direct path loads minimize the necessity of database processing, a few, fast calls to Oracle are made at the beginning and end of the load. Tables are locked and the locks are released at the end. Also, during the load, space management routines are used to get new extents when needed and to adjust the high-water mark. The high-water mark is described in "Data Saves" .
Oracle calls are also used to sort the data and build the index.
SQL calls are not performed anytime during the load.
Advantages of Direct Path Loads
The direct path method is faster than the conventional path for the following reasons:
- Partial blocks are not used, so no reads are needed to find them and fewer writes are performed.
- SQL INSERT commands are not generated by SQL*Loader, and therefore, processing load on the Oracle database is reduced.
- The bind-array buffer is not used -- formatted database blocks are written directly.
- The direct path method calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. The conventional path calls Oracle once for each array of rows to process a SQL INSERT statement.
- Unlike conventional path loads, direct path loads use asynchronous I/O, if available, to perform these operations in parallel:
- writing to database files
- Processes using the direct path perform their own write I/O, instead of using Oracle's buffer cache in contention with other Oracle users. Therefore, the direct path does not contend for free buffers in the buffer cache.
- The direct path's pre-sorting option allows you to use high-performance sort routines that are native to your system or installation.
- When the table to be loaded is empty, the pre-sorting option eliminates the sort and merge phases of index-building -- the index is simply filled in as data arrives.
- Protection against instance failure does not require redo log file entries during direct path loads. Therefore, if Oracle is operating in NOARCHIVELOG mode, no time is required to log the load. See "Instance Recovery with the Direct Path" .
When to Use a Direct Path Load
You should use a direct path load in the following situations:
- You have a large amount of data to load quickly. A direct path can quickly load and index large amounts of data. It can also load data into either an empty or non-empty table.
- You want to load data in PARALLEL for maximum performance. See .
- You want to load data in a character set that cannot be supported in your current session, or when the conventional conversion to the database character set would cause errors.
Conditions for Using Direct PATH LOADS
In addition to the general load conditions described , the following conditions must be satisfied to use the direct path load method:
- Tables are not clustered.
- Tables to be loaded do not have any active transactions pending.
To check for this condition, use the SQL*DBA command MONITOR TABLE to find the object ID for the table(s) you want to load. Then use the command MONITOR LOCK to see if there are any locks on the table.
- SQL strings are not used in the control file.
- If the table(s) is indexed, there are no current SELECT statements on the table(s).
It may be necessary to log off users that have SELECT statements on the table(s). You can use the The SQL*DBA command MONITOR TABLE to see which Oracle users are accessing the table.
Integrity Constraints
All integrity constraints are enforced during direct path loads, although not necessarily at the same time. All constraints that can be checked without referring to other rows or tables, such as the NOT NULL constraint, are enforced during the load. Records that fail these constraints are rejected.
Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be re-enabled afterwards. If REENABLE is specified, SQL*Loader can re-enable them automatically at the end of the load. When the constraints are re-enabled, the entire table is checked. Any rows that fail this check are reported in the specified error log. See the section in this chapter called "Direct Loads, Integrity Constraints, and Triggers".
Field Defaults on the Direct Path
DEFAULT column specifications defined in the database are not available when loading on the direct path. Fields for which default values are desired must be specified with the DEFAULTIF clause, described . If a DEFAULTIF clause is not specified, and the field is NULL, then a NULL value is inserted into the database.
Loading into Synonyms
You can load data into a synonym for a table during a the direct path load, but the synonym must point directly to a table. It cannot be a synonym for a view or a synonym for another synonym.
Exact Version Requirement
A SQL*Loader direct load can only be done for a database of the same version. For example, you cannot do a SQL*Loader Version 7.1.2 direct path load to load into a Oracle Version 7.1.3 database.