Oracle7 Server Utilities

Contents Index Home Previous Next

Mapping the Data to Oracle Format

Data to be loaded into the Oracle database must exist in files on disk or on tape. These datafiles require mapping (translation to Oracle format) to be loaded by SQL*Loader. You specify how SQL*Loader interprets the data via data definitions contained in control files. The control file also is the repository for certain file management information.

The Concept of Mapping Data

SQL*Loader must be told where and in what format the data to be loaded is and how to map the data to Oracle format. Definitions in the control file will include:

To define the specifications listed above you will use the SQL*Loader Data Definition Language (DDL).

The Data Definition Language (DDL)

The SQL*Loader data definition language (DDL) is used to specify exactly how SQL*Loader should interpret the data you are loading into the Oracle database. DDL is used to create DDL definitions which are the map that SQL*Loader uses to translate the loaded data into Oracle format.

The syntax and semantics of DDL is explained in detail in Chapter 5.

SQL*Loader DDL is upwardly compatible with the DB2 Load Utility from IBM. If you have a control file for the DB2 Load Utility, you can also use it with SQL*Loader. See Appendix C, "Notes for DB2/DXT Users," for differences in syntax.

DDL Definitions

DDL definitions can serve several purposes. Certain definitions specify data location or format. Other DDL definitions specify how SQL*Loader should map specific objects in the loaded data to comparable objects in an Oracle database. Other definitions deal with column definitions, datatype mapping and field specifications. DDL definitions are stored in control files which are read by SQL*Loader on startup.

A single DDL definition is composed of one or more keywords and the arguments and options that modify that keyword's functionality. An example of a control file containing several definitions defining how SQL*Loader should interpret a simple datafile might look like this:

	LOAD DATA
	INFILE 'example.dat'
	INTO TABLE emp
	(empno         POSITION(01:04)  INTEGER EXTERNAL,
	ename          POSITION(06:15)  CHAR,
	job            POSITION(17:25)  CHAR,
	mgr            POSITION(27:30)  INTEGER EXTERNAL,
	sal            POSITION(32:39)  DECIMAL EXTERNAL,
	comm           POSITION(41:48)  DECIMAL EXTERNAL,
...

The keywords are LOAD DATA, INFILE, INTO TABLE, POSITION, etc. (shown in all capital letters).

The Control File

Control files contain DDL definitions. You can create a control file using your system text editor.

How you store control files depends on how your operating system organizes data. For example, in UNIX environments, control files are stored in files; in MVS environments, they can be stored as members in a partitioned dataset. They must be located where SQL*Loader has access to them.

Some DDL definitions are mandatory, such as where to find the data and how it corresponds to the database tables. However, many options are also available to describe and manipulate the file data. For example, the instructions can include directions on how to format or filter the data, or to generate unique ID numbers.

A control file can also contain the data itself as well as the DDL definitions, as shown in Case 1 [*], or in separate files, as shown in Case 2 [*]. Detailed information on creating control files using DDL definitions is found [*].

Control File Guidelines

		--This is a comment

The Data

SQL*Loader can load data stored in various formats.

Binary versus Character Format Data

SQL*Loader can load numeric data in binary or character format. Character format is sometimes referred to as numeric external format.

Binary data is one example of native datatypes -- datatypes that are implemented differently on different operating systems. For more information on these and other native datatypes, see "Native Datatypes" [*]. SQL*Loader cannot handle binary data in variable record format. See also the section called "Loading Data Across Different Operating Systems" [*].

Data in character format can be included in both fixed-format and variable-format files. For more information on the character datatypes, see "Character Datatypes" [*].

Fixed versus Variable Format

Data records may be in fixed or variable format. In fixed format, the data is contained in records that all have the same (fixed) format. That is, the records have a fixed length, and the data fields in those records have fixed length, type, and position, as shown in Figure 3 - 1.

Figure 3 - 1. Fixed Format Records

In this example, columns 1 to 6 contain a character variable while columns 7 to 10 contain an integer for all the records. The fields are the same size in each record, regardless of the length of the data. The fields are fixed length, rather than variable length. In consequence, the record size is also fixed at 10 characters for each of the records.

In variable format (sometimes called stream format), each record is only as long as necessary to contain the data. Figure 3 - 2 shows variable length records containing one varying-length character fields and one fixed length integer field.

Figure 3 - 2. Variable Format Records

In addition, the type of data in each record may vary. One record may contain a character string, the next may contain seven integers, the third may contain three decimals and a float, and so on. Operating systems use a record terminator character (such as newline) to mark where variable records end.

Data Fields

Data in records is divided into fields. Fields can be specified with specific positions and lengths, or their position and length can vary based on delimiters.

There are two types of delimited fields: terminated and enclosed. Terminated fields are followed by a specified character (called a termination delimiter), such as the commas in the following example:

1,1,2,3,5,8,13

Enclosed fields are both preceded and followed by specified characters (called enclosure delimiters), such as the quotation marks in the following example:

"BUNKY"

Case 2 [*] shows fixed-length records. Case 1 [*] shows delimited fields. For more details on delimited data, see "Specifying Delimiters" [*].

Logical versus Physical Records

A final distinction concerns the difference between logical and physical records. A record or line in a file (either of fixed length or terminated) is referred to as a physical record. An operating system-dependent file/record management system, such as DEC's Record Management System (RMS) or IBM's Sequential Access Method (SAM) returns physical records.

Logical records, on the other hand, correspond to a row in a database table. Sometimes the logical and physical records are equivalent. Such is the case when only a few short columns are being loaded. However, sometimes several physical records must be combined to make one logical record. For example, you could have a file containing 24 10-character columns in a format of 80-character, fixed-length records. In this case, three physical records would constitute a single logical record.

SQL*Loader allows you to compose logical records from multiple physical records using continuation fields. Physical records are combined into a single, logical record when some condition of the continuation field is true. You can specify that a logical record should be composed of multiple, physical records in the following ways:

Case 4 [*] uses continuation fields to form one logical record from multiple physical records.


Contents Index Home Previous Next