Oracle7 Server Utilities

Contents Index Home Previous Next

Handling Different Character Encoding Schemes

This section describes the features that allow SQL*Loader to operate with different character encoding schemes (called character sets, or code pages). SQL*Loader uses Oracle's NLS (National Language Support) features to handle the different single-byte and multi-byte character encoding schemes used on different computers and in different countries.

Multi-Byte (Asian) Character Sets

Multi-byte character sets support Asian languages. Data can be loaded in multi-byte format, and database objects (fields, tables, and so on) can be specified with multi-byte characters. In the control file, comments and object names may also use multi-byte characters.

Input Character Conversion

SQL*Loader also has the capacity to convert data from the datafile character set to the database character set, when they are different. When using the conventional path, data is converted into the session character set specified by the NLS_LANG parameter for that session. Then the data is loaded using SQL INSERT statements. The session character set is the character set supported by your terminal.

During a direct path load, data converts directly into the database character set. As a consequence, the direct path load method allows data in a character set that is not supported by your terminal to be loaded.

When data conversion occurs, it is essential that the target character set contains a representation of all characters that exist in the data. Otherwise, characters that have no equivalent in the target character set are converted to a default character, with consequent loss of data. When using the direct path, load method the database character set should be a superset of, or equivalent to, the datafile character sets. Similarly, when using the conventional path, the session character set should be a superset of, or equivalent to, the datafile character sets.

The character set used in each input file is specified with the CHARACTERSET keyword.

CHARACTERSET Keyword

The CHARACTERSET definition tells SQL*Loader what character set is used in each datafile. Different datafiles can be specified with different character sets. Only one character set can be specified for each datafile.

Using the CHARACTERSET keyword causes character data to be automatically converted when it is loaded into Oracle. Only CHAR, DATE, and numeric EXTERNAL fields are affected. If the CHARACTERSET keyword is not specified, then no conversion occurs.

The syntax for this option is:

CHARACTERSET character_set_spec 

where character_set_spec is the acronym used by Oracle to refer to your particular encoding scheme.

Additional Information: For more information on supported character sets, code pages, and the NLS_LANG parameter, see the National Language Support section of the Oracle7 Server Reference.

Control File Characterset

The SQL*Loader control file itself is assumed to be in the character set specified for your session by the NLS_LANG parameter. However, delimiters and comparison clause values must be specified to match the character set in use in the datafile. To ensure that the specifications are correct, it may be preferable to specify hexadecimal strings, rather than character string values.

Any data included after the BEGINDATA statement is also assumed to be in the character set specified for your session by the NLS_LANG parameter. Data that uses a different character set must be in a separate file.


Contents Index Home Previous Next