Oracle7 Server Utilities

Contents Index Home Previous Next

Determining the Size of the Bind Array

The determination of bind array size pertains to SQL*Loader's conventional path option. It does not apply to the direct path load method. Because a direct path load formats database blocks directly, rather than using Oracle's SQL interface, it does not use a bind array.

SQL*Loader uses the SQL array-interface option to transfer data to the RDBMS. Multiple rows are read at one time and stored in the bind array. When SQL*Loader sends Oracle an INSERT command, the entire array is inserted at one time. After the rows in the bind array are inserted, a COMMIT is issued.

Minimum Requirements

The bind array has to be large enough to contain a single row. If the maximum row length exceeds the size of the bind array, as specified by the BINDSIZE parameter, SQL*Loader generates an error. Otherwise, the bind array contains as many rows as can fit within it, up to the limit set by the value of the ROWS parameter. The BINDSIZE parameter is described [*], the ROWS parameter [*].

Although the entire bind array need not be in contiguous memory, the buffer for each field in the bind array must occupy contiguous memory. If the operating system cannot supply enough contiguous memory to store a field, SQL*Loader generates an error.

Performance Implications

To minimize the number of calls to Oracle and maximize performance, large bind arrays are preferable. In general, you gain large improvements in performance with each increase in the bind array size up to 100 rows. Increasing the bind array size above 100 rows generally delivers more modest improvements in performance. So the size (in bytes) of 100 rows is typically a good value to use. The remainder of this section details the method for determining that size.

In general, any reasonably large size will permit SQL*Loader to operate effectively. It is not usually necessary to perform the detailed calculations described in this section. This section should be read when maximum performance is desired, or when an explanation of memory usage is needed.

Specifying Number of Rows vs. Size of Bind Array

When you specify a bind array size using the command-line parameter BINDSIZE (see [*]) or the OPTIONS clause in the control file (see [*]), you impose an upper limit on the bind array. The bind array never exceeds that maximum.

As part of its initialization, SQL*Loader determines the space required to load a single row. If that size is too large to fit within the specified maximum, the load terminates with an error.

SQL*Loader then multiplies that size by the number of rows for the load, whether that value was specified with the command-line parameter ROWS (see [*]) or the OPTIONS clause in the control file (see [*]). If that size fits within the bind array maximum, the load continues--SQL*Loader does not try to expand the number of rows to reach the maximum bind array size. That is, if the number of rows and the maximum bind array size are both specified, SQL*Loader always uses the smaller value for the bind array.

If the maximum bind array size is too small to accommodate the initial number of rows, SQL*Loader uses a smaller number of rows that fits within the maximum.

Calculations

The bind array's size is equivalent to the number of rows it contains times the maximum length of each row. The maximum length of a row is equal to the sum of the maximum field lengths, plus overhead.

bind array size = (number of rows) * (maximum row length) 

where:

(maximum row length) = SUM(fixed field lengths) + 
SUM(maximum varying field lengths) + 
SUM(overhead for varying length fields) 

Many fields do not vary in size. These fixed-length fields are the same for each loaded row. For those fields, the maximum length of the field is the field size, in bytes, as described in "Specifying Datatypes" [*]. There is no overhead for these fields.

The fields that can vary in size from row to row are

VARCHAR                VARGRAPHIC 
CHAR                   DATE 
numeric EXTERNAL 

The maximum length of these datatypes is described in "Specifying Datatypes" [*]. The maximum lengths describe the number of bytes, or character positions, that the fields can occupy in the input data record. That length also describes the amount of storage that each field occupies in the bind array, but the bind array includes additional overhead for fields that can vary in size.

When the character datatypes (CHAR, DATE, and numeric EXTERNAL) are specified with delimiters, any lengths specified for these fields are maximum lengths. When specified without delimiters, the size in the record is fixed, but the size of the inserted field may still vary, due to whitespace trimming. So internally, these datatypes are always treated as varying-length fields--even when they are fixed-length fields.

A length indicator is included for each of these fields in the bind array. The space reserved for the field in the bind array is large enough to hold the longest possible value of the field. The length indicator gives the actual length of the field for each row.

In summary:

bind array size = 
    (number of rows) * (  SUM(fixed field lengths) 
                        + SUM(maximum varying field lengths) 
                        + ( (number of varying length fields) 
                             * (size of length-indicator) ) 
                        ) 

Determining the Size of the Length Indicator

On most systems, the size of the length indicator is two bytes. On a few systems, it is three bytes. To determine its size, use the following control file:

OPTIONS (ROWS=1) 
LOAD DATA 
INFILE * 
APPEND 
INTO TABLE DEPT 
(deptno POSITION(1:1) CHAR) 
BEGINDATA 
a 

This control file "loads" a one-character field using a one-row bind array. No data is actually loaded, due to the numeric conversion error that occurs when "a" is loaded as a number. The bind array size shown in the log file, minus one (the length of the character field) is the value of the length indicator.

Note: A similar technique can determine bind array size without doing any calculations. Run your control file without any data and with ROWS=1 to determine the memory requirements for a single row of data. Multiply by the number of rows you want in the bind array to get the bind array size.

Calculating the Size of Field Buffers

The following tables summarize the memory requirements for each datatype. "L" is the length specified in the control file. "P" is precision. "S" is the size of the length indicator. For more information on these values, see "Specifying Datatypes" starting [*].

Datatype Size
INTEGER OS-dependent
SMALLINT
FLOAT
DOUBLE
Table 5 - 1. Invariant fields

Datatype Default Size Specified Size
(packed) DECIMAL None (P+1)/2, rounded up
ZONED None P
RAW None L
CHAR (no delimiters) 1 L+S
DATE (no delimiters) None
numeric EXTERNAL (no delimiters) None
MLSLABEL None
Table 5 - 2. Non-graphic fields

Datatype Default Size Length Specified with POSITION Length Specified with DATATYPE
GRAPHIC None L 2*L
GRAPHIC EXTERNAL None L - 2 2*(L-2)
VARGRAPHIC 4Kb*2 L+S (2*L)+S
Table 5 - 3. Graphic Fields

Datatype Default Size Maximum Length Specified (L)
VARCHAR 4Kb L+S
CHAR (delimited) DATE (delimited) numeric EXTERNAL (delimited) MLSLABEL (delimited) 255 L+S
Table 5 - 4. Variable-length fields

Minimizing Memory Requirements for the Bind Array

Pay particular attention to the default sizes allocated for VARCHAR, VARGRAPHIC, and the delimited forms of CHAR, DATE, and numeric EXTERNAL fields. They can consume enormous amounts of memory--especially when multiplied by the number of rows in the bind array. It is best to specify the smallest possible maximum length for these fields. For example:

CHAR(10) TERMINATED BY "," 

uses (10 + 2) * 64 = 768 bytes in the bind array, assuming that the length indicator is two bytes long. However:

CHAR TERMINATED BY "," 

uses (255 + 2) * 64 = 16,448 bytes, because the default maximum size for a delimited field is 255. This can make a considerable difference in the number of rows that fit into the bind array.

Multiple INTO TABLE Statements

When calculating a bind array size for a control file that has multiple INTO TABLE statements, calculate as if the INTO TABLE statements were not present. Imagine all of the fields listed in the control file as one, long data structure -- that is, the format of a single row in the bind array.

If the same field in the data record is mentioned in multiple INTO TABLE clauses, it requires additional space in the bind array each time it is mentioned. So, it is especially important to minimize the buffer allocations for fields like these.

Generated Data

Generated data is produced by the SQL*Loader functions CONSTANT, RECNUM, SYSDATE, and SEQUENCE. Such generated data does not require any space in the bind array.


Contents Index Home Previous Next