Tuesday, September 4, 2007

The SQL* Loader in Oracle 10g

The SQL* Loader in Oracle 10g

Learning objective

After completing this topic, you should be able to identify the function of the SQL* Loader and the files it uses.

1. SQL*Loader

The SQL*Loader tool can read data in a variety of formats and write it into Oracle database objects, such as tables.

Several files are involved in the operation of the SQL*Loader.

  • Input data files
  • Control file
  • Log file
  • Bad files
  • Discard files
Input data files
INFILE clauses in the control file can specify input data files.

The SQL*Loader recognizes the contents of an input data file as a collection of records.

The input data file uses one of these formats:

  • fixed record format
  • variable record format
  • stream record format


The control file's INFILE clause can specify the input data file's format as well as its name.

If the format is not specified, the SQL*Loader assumes that stream record format is being used.
Control file
The SQL*Loader's control file is written in a language that the loader understands.

It specifies information such as

  • the location of the data
  • how to parse and interpret the data
  • where to write the data in the database


Control files are loosely organized into three sections:

The first section of the file contains session-wide information, such as bind size and which records to skip. It also includes INFILE clauses that tell the loader where input data is located. It can also contain input data itself.

The second section of the file contains at least one INTO TABLE block, which specifies details about the table into which data should be loaded. This information includes the table's name and its column names.

The optional third section of the file contains input data.
Log file
When the SQL*Loader executes, it creates a log file and writes summary information to it about the load.

This information includes descriptions of any errors that occur during the load.

The load session terminates if log-file creation fails, so a log file is mandatory.
Bad files
The SQL*Loader rejects a record when it interprets its input format as invalid.

It stores such a record in a bad file.

Even when the SQL*Loader recognizes a record as valid, the database itself may interpret it as being invalid.

Such a record is stored by the SQL*Loader in a bad file.
Discard files
Discard files, which are optional, contain records filtered out of the input data because they don't match the record-selection criteria specified in the control file.

Using the input data files and the control file, the SQL*Loader writes to the log file, creates both bad files and discard files, and populates the database.

Question

Identify the characteristics of the SQL*Loader.

Options:

  1. It can only deal with data organized in a unique, SQL*Loader-specific format
  2. It can deal with data in various formats
  3. It only loads data in a single data type
  4. It loads data into different database objects

Answer

The SQL*Loader can deal with data in various formats and load data into various database objects.

Option 1 is incorrect. The SQL*Loader can parse data organized in various ways.

Option 2 is correct. The control file can specify the format of the input data.

Option 3 is incorrect. The SQL*Loader can load data into various data types that Oracle supports.

Option 4 is correct. Compatible object types include tables and indices.

Question

Match each file to its SQL*Loader use.

Options:

  1. Bad file
  2. Discard file
  3. Log file

Targets:

  1. Stores summary information about the load
  2. Stores records that don't match the record-selection criteria specified in the control file
  3. Stores records interpreted by the SQL*Loader as invalid

Answer

The bad file stores records interpreted by the SQL*Loader as invalid. The discard file stores records that don't match the record-selection criteria specified in the control file. The log file stores summary information about the load.

The bad file also stores records the database itself interprets as invalid.

The discard file is optional - the loader will work correctly without it.

The log file is mandatory - the loader can't be used unless one is created.

2. The SQL*Loader control file

The SQL*Loader's control file contains data-description language (DDL) code that specifies

  • the input data location
  • the input data format
  • the loader configuration
  • how the loader should manipulate the input data

The control file contains free-format, case-insensitive code.

Note

You should never name a table or column CONSTANT – this word has special meaning for the SQL*Loader.

A typical control file begins with an explanatory comment.

To mark a particular line of the file as a comment line, you begin the line with two hyphens. The parser ignores all text on that line after the hyphens.

-- This is a sample control file

Note

You can add comments to the command section of the file, but not to the input data section.

Question

Identify considerations relating to control-file syntax.

Options:

  1. A comment must be preceded by two hyphens
  2. The code is case-insensitive
  3. The code is data-manipulation language (DML)
  4. The code is free-format

Answer

Control-file code is free-format and case-insensitive, and a comment must be preceded by two hyphens.

Option 1 is correct. All text after the hyphens is ignored by the parser.

Option 2 is correct. As with all SQL code, you can change the case of characters without changing the meaning of the code.

Option 3 is incorrect. The code is data-description language (DDL).

Option 4 is correct. As with all SQL code, a statement can be spread out over multiple lines or can extend over one line.

The LOAD DATA command tells the SQL*Loader that a new data load is about to start.

LOAD DATA

Note

If the control file is meant to tell the loader to continue a previous data load, you should type the CONTINUE LOAD DATA statement on this line.

You use the INFILE keyword to specify the name of the input data file.

Similarly, you use the BADFILE keyword to specify the name of the bad file.

And you use the DISCARDFILE keyword to specify the name of the discard file.

INFILE 'SAMPLE.DAT'
BADFILE 'SAMPLE.BAD'
DISCARDFILE 'SAMPLE.DSC'

If the table you want to load data into is not empty, you need to use the APPEND keyword to specify this to the SQL*Loader.

Then you specify the name of the table to which the data should be added using an INTO TABLE statement.

APPEND
INTO TABLE emp

Note

In the case of an empty table, you use the INSERT keyword.

You can use the WHEN keyword to include conditions that each record must match before it is loaded.

In this example, the WHEN statement specifies that a record should be loaded into the table only when its fifty-seventh character is a decimal point.

WHEN (57) = '.'

The TRAILING NULLCOLS statement tells the SQL*Loader to treat relatively positioned columns missing from a record as null columns.

TRAILING NULLCOLS

The rest of the control file comprises information about the table's column formats.

In this example, the columns are HIREDATE, DEPTNO, JOB, MGR, ENAME, EMPNO, SAL, and COMM.

(hiredate SYSDATE,
deptno POSITION(1:2) INTEGER EXTERNAL(3)
NULLIF deptno=BLANKS,
job POSITION(7:14) CHAR TERMINATED BY WHITESPACE
NULLIF job=BLANKS "UPPER(:job)",
mgr POSITION(28:31) INTEGER EXTERNAL
TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS,
ename POSITION(34:41) CHAR
TERMINATED BY WHITESPACE "UPPER(:ename)",
empno POSITION(45) INTEGER EXTERNAL
TERMINATED BY WHITESPACE,
sal POSITION(51) CHAR TERMINATED BY WHITESPACE
"TO_NUMBER(:sal,'$99,999.99')",
comm INTEGER EXTERNAL ENCLOSED BY '(' AND '%' ":comm * 100")

Supplement

Selecting the link title opens the resource in a new browser window.

Launch window

View the complete control-file script.

Question

Match each statement to the line of control-file code that performs that purpose.

Options:

  1. Signifies that a new load is about to start
  2. Specifies where the input data will come from
  3. Provides an explanatory comment

Targets:

  1. -- Sample control file
  2. LOAD DATA
  3. INFILE 'SAMPLE.DAT'

Answer

The line -- Sample control file is an explanatory comment. The line LOAD DATA signifies that a new load is about to start. The line INFILE 'SAMPLE.DAT' specifies where the input data will come from.

To continue a previous load, you use the statement CONTINUE LOAD DATA.

You can also specify the input-file format in the INFILE statement.

The parser ignores any line that begins with two hyphens.

3. Data file formats

An input data file has one of three record formats.

  • Fixed record format
  • Variable record format
  • Stream record format
Fixed record format
An input data file has fixed record format when every record in the file has the same byte length.

While this format is rather inflexible, it provides the best loading performance of all formats.

Suppose that your input data file, which is called myfile.dat, has fixed record format, with each record eight bytes in length.

This is how you specify the format in your control file.

The control-file code is as follows:
INFILE 'myfile'.dat "fix 8"

Suppose that your input data file, example.dat, contains three records, each of which contains two fields separated by a comma.

Each record contains ten bytes in total.

This control file script provides the SQL*Loader with the correct load settings for the input data file.

The script is as follows:
load data
infile 'example.dat' "fix 10"
into table example
fields terminated by ','
(col1, col2)
example.dat:
0001,abcd
0002,fghi
0003,klmn
Variable record format
The length of each record in a variable record format file is specified at the beginning of the record.

While this provides more flexibility than fixed record format, it provides inferior loading performance.

However, its loading performance is superior to that of stream record format.

Suppose that your input data file, example.dat, is in variable record format, and the field at the beginning of the record that specifies the record's length is three bytes in length.

You need to specify the record format in the control file. Optionally, you can specify the length of the record-length field in the same statement.

If you don't specify the length of the record-length field, the default length is 5. The maximum permitted length is 40.

This is the required control-file statement.
INFILE 'example.dat' "var 3"
Stream record format
In the case of stream record format, the SQL*Loader distinguishes between records by searching the input-data file for record terminators.

While this provides a flexible way of storing data, it provides relatively slow loading.

This is the syntax for the specification of a data file that is in stream record format:

INFILE <datafilename> ["str terminatorstring"]

The code is:

INFILE 'myfile.dat' "str 'l\n'"

If your session's character set differs from that of the input data file, character strings in the control file are changed to match the input data file's character set.

However, hexadecimal values specified in the control file are assumed to be in the input data file's character set, so they are not converted.

The default terminator string, when none is specified, is the newline character.

In this example, the terminator string is specified using a character string - '|\n'.

The backslash character followed by n signifies that the string specifies a nonprintable line-feed character.

The script is as follows:
load data
infile 'example.dat' "str '|\n'"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1 char(5),
col2 char(7))
example.dat:
hello,world,|
james,bond,|

Question

Identify the characteristic of variable record format.

Options:

  1. All records are of the same byte length
  2. The maximum byte length of the record-length field is 40
  3. Records are differentiated using record terminators

Answer

In variable record format, the maximum byte length of the record-length field is 40.

Option 1 is incorrect. This is a characteristic of fixed record format.

Option 2 is correct. The default length is 5.

Option 3 is incorrect. This is a characteristic of stream record format.

Question

Identify the characteristic of stream record format.

Options:

  1. All records are of the same byte length
  2. A field at the start of each record specifies the byte length of the record
  3. Records are differentiated using record terminators
  4. SQL*Loader does not recognize this format

Answer

Stream record format records are differentiated using record terminators.

Option 1 is incorrect. This is a characteristic of fixed record format.

Option 2 is incorrect. This is a characteristic of variable record format.

Option 3 is correct. The record terminator can comprise both printable characters and nonprintable characters.

Option 4 is incorrect. The parser can recognize stream record format.

Summary

The SQL*Loader takes input data and loads it into the specified Oracle database object. It uses several files during the course of its operation: input data files, the control file, the log file, the bad file, and the discard file.

The control file specifies information such as where the input data comes from, the data storage format, the loader configuration, and how the loader should manipulate the input data.

SQL*Loader's parser recognizes three input data record formats: fixed, variable, and stream. You can specify the input data's format in the control file's INFILE statement.

No comments: