Oracle7 Server Utilities

Contents Index Home Previous Next

Specifying Datatypes

This section describes SQL*Loader's datatypes and explains how they are converted to Oracle datatypes.

Datatype Conversions

The datatype specifications in the control file tell SQL*Loader how to interpret the information in the datafile. The server defines the datatypes for the columns in the database. The link between these two is the column name specified in the control file.

SQL*Loader extracts data from a field in the input file, guided by the datatype specification in the control file. SQL*Loader then sends the field to the server to be stored in the appropriate column (as part of an array of row inserts). The server does any necessary data conversion to store the data in the proper internal format. The "Data Conversion and Datatype Specification" section beginning [*] contains diagrams that illustrate these points.

The datatype of the data in the file does not necessarily have to be the same as the datatype of the column in the Oracle table. Oracle automatically performs conversions, but you need to ensure that the conversion makes sense and does not generate errors. For instance, when a datafile field with datatype CHAR is loaded into a database column with datatype NUMBER, you must make sure that the contents of the character field represent a valid number.

Note: SQL*Loader does not contain datatype specifications for Oracle internal datatypes such as NUMBER or VARCHAR2. SQL*Loader's datatypes describe data that can be produced with text editors (character datatypes) and with standard programming languages (native datatypes). However, although SQL*Loader does not recognize datatypes like NUMBER and VARCHAR2, any data that Oracle is capable of converting may be loaded into these or other database columns.

Native Datatypes

Some datatypes consist entirely of binary data or contain binary data in their implementation. See [*] for a discussion of binary vs. character data. These non-character datatypes are the native datatypes:

INTEGER ZONED SMALLINT VARCHAR FLOAT GRAPHIC DOUBLE GRAPHIC EXTERNAL BYTEINT VARGRAPHIC (packed) DECIMAL RAW MLSLABEL (Trusted Oracle)

Since these datatypes contain binary data, most of them do not readily transport across operating systems. (See "Loading Data Across Different Operating Systems" [*].) RAW data and GRAPHIC data is the exceptions. SQL*Loader does not attempt to interpret these datatypes, but simply stores them "as is".

Additional Information: Native datatypes cannot be specified with delimiters. The size of the native datatypes INTEGER, SMALLINT, FLOAT, and DOUBLE are determined by the host operating system. Their size is fixed -- it cannot be overridden in the control file. (Refer to your Oracle operating system-specific documentation for more information.) The sizes of the other native datatypes may be specified in the control file.

INTEGER

The data is a fullword binary integer. If you specify start:end in the POSITION clause, end is ignored. The length of the field is the length of a fullword integer on your system. (Datatype LONG INT in C.) This length cannot be overridden in the control file.

INTEGER 

SMALLINT

The data is a half-word binary integer. If you specify start:end in the POSITION clause, end is ignored. The length of the field is a half-word integer is on your system.

SMALLINT 

Additional Information: This is the SHORT INT datatype in the C programming language. One way to determine its length is to make a small control file with no data and look at the resulting log file. This length cannot be overridden in the control file. See your Oracle operating system-specific documentation for details.

FLOAT

The data is a single-precision, floating-point, binary number. If you specify end in the POSITION clause, it is ignored. The length of the field is the length of a single-precision, floating-point binary number on your system. (Datatype FLOAT in C.) This length cannot be overridden in the control file.

FLOAT 

DOUBLE

The data is a double-precision, floating-point binary number. If you specify end in the POSITION clause, it is ignored. The length of the field is the length of a double-precision, floating-point binary number on your system. (Datatype DOUBLE or LONG FLOAT in C.) This length cannot be overridden in the control file.

DOUBLE 

BYTEINT

The decimal value of the binary representation of the byte is loaded. For example, the input character x"1C" is loaded as 28. The length of a BYTEINT field is always 1 byte. If POSITION(start:end) is specified, end is ignored.

The syntax for this datatype is

BYTEINT 

An example is

(column1 position(1) BYTEINT, 
 column2 BYTEINT, 
 ... 
) 

ZONED

ZONED data is in zoned decimal format: a string of decimal digits, one per byte, with the sign included in the last byte. (In COBOL, this is a SIGN TRAILING field.) The length of this field is equal to the precision (number of digits) that you specify.

The syntax for this datatype is:

where precision is the number of digits in the number, and scale (if given) is the number of digits to the right of the (implied) decimal point. For example:

sal  POSITION(32)  ZONED(8), 

specifies an 8-digit integer starting at position 32.

DECIMAL

DECIMAL data is in packed decimal format: two digits per byte, except for the last byte which contains a digit and sign. DECIMAL fields allow the specification of an implied decimal point, so fractional values can be represented.

The syntax for the this datatype is:

where:

precision Is the number of digits in the value. The character length of the field, as computed from digits, is (digits + 2) / 2, rounded down.
scale Is the scaling factor, or number of digits to the right of the decimal point. Default is zero (indicating an integer). May be greater than the number of digits but may not be negative.
For example,

sal DECIMAL (7,2) 

would load a number equivalent to +12345.67. In the data record, this field would take up 4 bytes, as shown in Figure 5 - 1. (The byte length of a DECIMAL field is equivalent to (N+1)/2, rounded up, where N is the number of digits in the value, and one is added for the sign.)

Figure 5 - 1. Packed Decimal Representation of +12345.67

RAW

The data is raw, binary data loaded "as is". It does not undergo character set conversion. If loaded into a RAW database column, it is not converted by Oracle. If it is loaded into a CHAR column, Oracle converts it to hexadecimal. It cannot be loaded into a DATE or number column.

The syntax for this datatype is

The length of this field is the number of bytes specified in the control file. This length is limited only by the length of the target column in the database and by memory resources.

GRAPHIC

The data is a string of double-byte characters (DBCS). Oracle does not support DBCS, however SQL*Loader reads DBCS as single bytes. Like RAW data, GRAPHIC fields are stored without modification in whichever column you specify.

The syntax for this datatype is

For both GRAPHIC and GRAPHIC EXTERNAL, if you specify POSITION(start:end) you give the exact location of the field in the logical record. If you specify the length after the GRAPHIC (EXTERNAL) keyword, however, then you give the number of double-byte graphic characters. That value is multiplied by 2 to find the length of the field in bytes. If the number of graphic characters is specified, then any length derived from POSITION is ignored.

GRAPHIC EXTERNAL

If the DBCS field is surrounded by shift-in and shift-out characters, use GRAPHIC EXTERNAL. This is identical to GRAPHIC, except that the first and last characters (the shift-in and shift-out) are not loaded.

The syntax for this datatype is:

where:

GRAPHIC Data is double-byte characters.
EXTERNAL First and last characters are ignored.
graphic_char_length Length in DBCS (see GRAPHIC, above)
For example, let [ ] represent shift-in and shift-out characters, and let # represent any double-byte character.

To describe ####, use "POSITION(1:4) GRAPHIC" or "POSITION(1) GRAPHIC(2)".

To describe [####], use "POSITION(1:6) GRAPHIC EXTERNAL" or "POSITION(1) GRAPHIC EXTERNAL(2)".

VARGRAPHIC

The data is a varying-length, double-byte character string. It consists of a length subfield followed by a string of double-byte characters (DBCS).

Additional Information: The size of the length subfield is the size of the SQL*Loader SMALLINT datatype on your system (C type SHORT INT). See SMALLINT [*] for more information.

The length of the current field is given in the first two bytes. This length is a count of graphic (double-byte) characters. So it is multiplied by two to determine the number of bytes to read.

The syntax for this datatype is

A maximum length specified after the VARGRAPHIC keyword does not include the size of the length subfield. The maximum length specifies the number of graphic (double byte) characters. So it is also multiplied by two to determine the maximum length of the field in bytes.

The default maximum field length is 4K graphic characters, or 8 Kb (2 * 4K). It is a good idea to specify a maximum length for such fields whenever possible, to minimize memory requirements. See "Determining the Size of the Bind Array" [*] for more details.

The POSITION clause, if used, gives the location of the length subfield, not of the first graphic character. If you specify POSITION(start:end), the end location determines a maximum length for the field. Both start and end identify single-character (byte) positions in the file. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, it overrides any maximum length calculated from POSITION.

If a VARGRAPHIC field is truncated by the end of the logical record before its full length is read, a warning is issued. Because a VARCHAR field's length is embedded in every occurrence of the input data for that field, it is assumed to be accurate.

VARGRAPHIC data cannot be delimited.

VARCHAR

A VARCHAR field is a varying-length character string. It is considered a native datatype, rather than a character datatype because it includes binary data (a length). It consists of a length subfield followed by a character string of the given length.

Additional Information: The size of the length subfield is the size of the SQL*Loader SMALLINT datatype on your system (C type SHORT INT). See SMALLINT [*] for more information.

The syntax for this datatype is:

A maximum length specified in the control file does not include the size of the length subfield. If you specify the optional maximum length after the VARCHAR keyword, then a buffer of that size is allocated for these fields.

The default buffer size is 4 Kb. Specifying the smallest maximum length that is needed to load your data can minimize SQL*Loader's memory requirements, especially if you have many VARCHAR fields. See "Determining the Size of the Bind Array" [*] for more details.

The POSITION clause, if used, gives the location of the length subfield, not of the first text character. If you specify POSITION(start:end), the end location determines a maximum length for the field. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, it overrides any length calculated from POSITION.

If a VARCHAR field is truncated by the end of the logical record before its full length is read, a warning is issued. Because a VARCHAR field's length is embedded in every occurrence of the input data for that field, it is assumed to be accurate.

VARCHAR data cannot be delimited.

Conflicting Native Datatype Field Lengths

There are several ways to specify a length for a field. If multiple lengths are specified and they conflict, then one of the lengths takes precedence. A warning is issued when a conflict exists. The following rules determine which field length is used:

For example, if the native datatype INTEGER is 4 bytes long and the following field specification is given:

column1 POSITION(1:6) INTEGER 

then a warning is issued, and the proper length (4) is used. In this case, the log file shows the actual length used under the heading "Len" in the column table:

Column Name             Position   Len  Term Encl Datatype 
----------------------- --------- ----- ---- ---- --------- 
COLUMN1                     1:6     4           INTEGER 

Character Datatypes

The character datatypes are CHAR, DATE, and the numeric EXTERNAL datatypes. These fields can be delimited and can have lengths (or maximum lengths) specified in the control file.

Additional Information: In addition, the MLSLABEL character datatype exists in Trusted Oracle. See the Trusted Oracle7 Server Administrator's Guide for more information on this datatype.

CHAR

The data field contains character data. The length is optional and is taken from the POSITION specification if it is not present here. If present, this length overrides the length in the POSITION specification. If no length is given, CHAR data is assumed to have a length of 1. The syntax is:

A field of datatype CHAR may also be variable-length delimited or enclosed. See "Specifying Delimiters" [*].

Attention: If the column in the database table is defined as LONG, you must explicitly specify a maximum length (maximum for a LONG is 2 gigabytes) either with a length specifier on the CHAR keyword or with the POSITION keyword. This guarantees that a large enough buffer is allocated for the value and is necessary even if the data is delimited or enclosed.

DATE

The data field contains character data that should be converted to an Oracle date using the specified date mask. The syntax is:

For example:

LOAD DATA 
INTO TABLE DATES (COL_A POSITION (1:15) DATE "DD-Mon-YYYY") 
BEGINDATA 
1-Jan-1991 
1-Apr-1991 28-Feb-1991 

Attention: Whitespace is ignored and dates are parsed from left to right unless delimiters are present.

The length specification is optional, unless a varying-length date mask is specified. In the example above, the date mask specifies a fixed-length date format of 11 characters. SQL*Loader counts 11 characters in the mask, and therefore expects a maximum of 11 characters in the field, so the specification works properly. But, with a specification such as

DATE "Month dd, YYYY" 

the date mask is 14 characters, while the maximum length of a field such as

September 30, 1991 

is 18 characters. In this case, a length must be specified. Similarly, a length is required for any Julian dates (date mask "J")--a field length is required any time the length of the date string could exceed the length of the mask (that is, the count of characters in the mask).

If an explicit length is not specified, it can be derived from the POSITION clause. It is a good idea to specify the length whenever you use a mask, unless you are absolutely sure that the length of the data is less than, or equal to, the length of the mask.

An explicit length specification, if present, overrides the length in the POSITION clause. Either of these overrides the length derived from the mask. The mask may be any valid Oracle date mask. If you omit the mask, the default Oracle date mask of "dd-mon-yy" is used. The length must be enclosed in parentheses and the mask in quotation marks. Case 3 [*] has an example of the DATE datatype.

A field of datatype DATE may also be specified with delimiters. For more information, see "Specifying Delimiters" [*].

A date field that consists entirely of whitespace produces an error unless NULLIF BLANKS is specified. For more information, see "Loading All-Blank Fields" [*].

MLSLABEL

This is a Trusted Oracle7 datatype that stores the binary format of an operating system label. For more information see the Trusted Oracle7 Server Administrator's Guide.

Numeric External Datatypes

The numeric external datatypes are the numeric datatypes (INTEGER, FLOAT, DECIMAL, and ZONED) specified with the EXTERNAL keyword with optional length and delimiter specifications. These datatypes are the human-readable, character form of numeric data.

The data is a number in character form (not binary representation). As such, these datatypes are identical to CHAR and are treated identically, with one exception: the use of DEFAULTIF. If you want the default to be null, use CHAR. If you want it to be zero, use EXTERNAL.

Numeric EXTERNAL may be specified with lengths and delimiters, just like CHAR data. Length is optional, but if specified, overrides POSITION.

The syntax for this datatype is:

FLOAT EXTERNAL Data Values

FLOAT EXTERNAL data can be given in either scientific or regular notation. Both "5.33" and "533E-2" are valid representations of the same value.

Specifying Delimiters

The boundaries of CHAR, DATE, MLSLABEL, or numeric EXTERNAL fields may also be marked by specific delimiter characters contained in the input data record. You indicate how the field is delimited by using a delimiter specification after specifying the datatype.

Delimited data can be TERMINATED or ENCLOSED.

TERMINATED Fields

TERMINATED fields are read from the starting position of the field up to, but not including, the first occurrence of the delimiter character. If the terminator delimiter is found in the first column position, the field is null.

TERMINATED BY WHITESPACE

If TERMINATED BY WHITESPACE is specified, data is read until the first occurrence of a whitespace character (space, tab, newline). Then the current position is advanced until no more adjacent whitespace characters are found. This allows field values to be delimited by varying amounts of whitespace.

Enclosed Fields

Enclosed fields are read by skipping whitespace until a non-whitespace character is encountered. If that character is the delimiter, then data is read up to the second delimiter. Any other character causes an error.

If two delimiter characters are encountered next to each other, a single occurrence of the delimiter character is used in the data value. For example, 'DON''T' is stored as DON'T. However, if the field consists of just two delimiter characters, its value is null. You may specify a TERMINATED BY clause, an ENCLOSED BY clause, or both. If both are used, the TERMINATED BY clause must come first.

The syntax for delimiter specifications is:

where:

TERMINATED Data is read until first occurrence of a delimiter.
BY This is an optional keyword for readability.
WHITESPACE Delimiter is any whitespace character, including linefeed, formfeed, or carriage return. (Only used with TERMINATED, not with ENCLOSED.)
OPTIONALLY Data may be enclosed by the indicated character. If SQL*Loader finds a first occurrence of the character, it reads the data value until it finds the second occurrence. If the data is not enclosed, the data is read as a terminated field. If optional enclosure is specified, there must be a TERMINATED BY clause--either locally in the field definition, or globally, in the FIELDS clause.
ENCLOSED The data value is found between two delimiters.
char Delimiter is the single character char.
X'hex-byte' Delimiter is the single character that has the value specified by hex-byte in the character encoding scheme, such as X`1F' (equivalent to 31 decimal). "X" must be uppercase.
Note: Due to National Language Support (NLS) requirements, hex 00 cannot be used as a separator. All other hex values are supported.
AND This keyword specifies a trailing enclosure delimiter, which may be different from the initial enclosure delimiter. If the AND clause is not present, then the initial and trailing enclosure delimiters are the same.
Here are some examples, with samples of the data they describe:

TERMINATED BY ','			 a data string, 
ENCLOSED BY '"'			"a data string" 
TERMINATED BY ',' ENCLOSED BY '"'	"a data string", 
ENCLOSED BY "(" AND ')'		(a data string) 

Delimiter Marks in the Data

Sometimes the same punctuation mark that is a delimiter also needs to be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data. For example, this data:

(The delimiters are left paren's, ((, and right paren's, )).) 

with this field specification:

ENCLOSED BY "(" AND ")" 

puts the following string into the database:

The delimiters are left paren's, (, and right paren's, ). 

For this reason, problems can arise when adjacent fields use the same delimiters. For example, the following specification:

field1 TERMINATED BY "/" 
field2 ENCLOSED by "/" 

the following data will be interpreted properly:

This is the first string/      /This is the second string/ 

But if field1 and field2 were adjacent, then the results would be incorrect, because

This is the first string//This is the second string/ 

would be interpreted as a single character string with a "/" in the middle of it, and that string would belong to field1.

Maximum Length of Delimited Data

The default maximum length of delimited data is 255 bytes. So delimited fields can require significant amounts of storage for the bind array. It is a good idea to specify the smallest possible maximum value. See "Determining the Size of the Bind Array" [*].

Loading Trailing Blanks with Delimiters

Trailing blanks can only be loaded with delimited datatypes. If a data field is nine characters long and contains the value DANIELbbb, where bbb is three blanks, it is loaded into Oracle as "DANIEL" if declared as CHAR(9). If you want the trailing blanks, you could declare it as CHAR(9) TERMINATED BY `:', and add a colon to the datafile so that the field is DANIELbbb:. This field is loaded as "DANIEL ", with the trailing blanks. For more discussion on whitespace in fields, see "Trimming of Blanks and Tabs" [*].

Conflicting Character Datatype Field Lengths

A control file can specify multiple lengths for the character-data fields CHAR, DATE, MLSLABEL, and numeric EXTERNAL. If conflicting lengths are specified, one of the lengths takes precedence. A warning is also issued when a conflict exists. This section explains which length is used.

Predetermined Size Fields

If you specify a starting position and ending position for one of these fields, then the length of the field is determined by these specifications. If you specify a length as part of the datatype and do not give an ending position, the field has the given length. If starting position, ending position, and length are all specified, and the lengths differ; then the length given as part of the datatype specification is used for the length of the field.

For example, if

position(1:10) char(15) 

is specified, then the length of the field is 15.

Delimited Fields

If a delimited field is specified with a length, or if a length can be calculated from the starting and ending position, then that length is the maximum length of the field. The actual length can vary up to that maximum, based on the presence of the delimiter. If a starting and ending position are both specified for the field and if a field length is specified in addition, then the specified length value overrides the length calculated from the starting and ending position.

If the expected delimiter is absent and no maximum length has been specified, then the end of record terminates the field. If TRAILING NULLCOLS is specified, remaining fields are null. If either the delimiter or the end of record produce a field that is longer than the specified maximum, SQL*Loader generates an error.

Date Field Masks

The length of a date field depends on the mask, if a mask is specified. The mask provides a format pattern, telling SQL*Loader how to interpret the data in the record. For example, if the mask is specified as:

"Month dd, yyyy" 

then "May 3, 1991" would occupy 11 character positions in the record, while "January 31, 1992" would occupy 16.

If starting and ending positions are specified, however, then the length calculated from the position specification overrides a length derived from the mask. A specified length such as "DATE (12)" overrides either of those. If the date field is also specified with terminating or enclosing delimiters, then the length specified in the control file is interpreted as a maximum length for the field.

MLSLABEL Field Masks

These are Trusted Oracle7 masks. For more information, see the Trusted Oracle7 Server Administrator's Guide.


Contents Index Home Previous Next