The SQL* Loader in Oracle 10g
Learning objective
After completing this topic, you should be able to identify the function of the SQL* Loader and the files it uses.
1. SQL*Loader
The SQL*Loader tool can read data in a variety of formats and write it into Oracle database objects, such as tables.
Several files are involved in the operation of the SQL*Loader.
- Input data files
- Control file
- Log file
- Bad files
- Discard files
- Input data files
INFILE
clauses in the control file can specify input data files.
The SQL*Loader recognizes the contents of an input data file as a collection of records.
The input data file uses one of these formats:
- fixed record format
- variable record format
- stream record format
The control file'sINFILE
clause can specify the input data file's format as well as its name.
If the format is not specified, the SQL*Loader assumes that stream record format is being used.- Control file
- The SQL*Loader's control file is written in a language that the loader understands.
It specifies information such as
- the location of the data
- how to parse and interpret the data
- where to write the data in the database
Control files are loosely organized into three sections:
The first section of the file contains session-wide information, such as bind size and which records to skip. It also includesINFILE
clauses that tell the loader where input data is located. It can also contain input data itself.
The second section of the file contains at least oneINTO TABLE
block, which specifies details about the table into which data should be loaded. This information includes the table's name and its column names.
The optional third section of the file contains input data. - Log file
- When the SQL*Loader executes, it creates a log file and writes summary information to it about the load.
This information includes descriptions of any errors that occur during the load.
The load session terminates if log-file creation fails, so a log file is mandatory. - Bad files
- The SQL*Loader rejects a record when it interprets its input format as invalid.
It stores such a record in a bad file.
Even when the SQL*Loader recognizes a record as valid, the database itself may interpret it as being invalid.
Such a record is stored by the SQL*Loader in a bad file. - Discard files
- Discard files, which are optional, contain records filtered out of the input data because they don't match the record-selection criteria specified in the control file.
Using the input data files and the control file, the SQL*Loader writes to the log file, creates both bad files and discard files, and populates the database.
Question
Identify the characteristics of the SQL*Loader.
Options:
- It can only deal with data organized in a unique, SQL*Loader-specific format
- It can deal with data in various formats
- It only loads data in a single data type
- It loads data into different database objects
Answer
The SQL*Loader can deal with data in various formats and load data into various database objects.
Option 1 is incorrect. The SQL*Loader can parse data organized in various ways.
Option 2 is correct. The control file can specify the format of the input data.
Option 3 is incorrect. The SQL*Loader can load data into various data types that Oracle supports.
Option 4 is correct. Compatible object types include tables and indices.
Question
Match each file to its SQL*Loader use.
Options:
- Bad file
- Discard file
- Log file
Targets:
- Stores summary information about the load
- Stores records that don't match the record-selection criteria specified in the control file
- Stores records interpreted by the SQL*Loader as invalid
Answer
The bad file stores records interpreted by the SQL*Loader as invalid. The discard file stores records that don't match the record-selection criteria specified in the control file. The log file stores summary information about the load.
The bad file also stores records the database itself interprets as invalid.
The discard file is optional - the loader will work correctly without it.
The log file is mandatory - the loader can't be used unless one is created.
2. The SQL*Loader control file
The SQL*Loader's control file contains data-description language (DDL) code that specifies
- the input data location
- the input data format
- the loader configuration
- how the loader should manipulate the input data
The control file contains free-format, case-insensitive code.
Note
You should never name a table or column CONSTANT
– this word has special meaning for the SQL*Loader.
A typical control file begins with an explanatory comment.
To mark a particular line of the file as a comment line, you begin the line with two hyphens. The parser ignores all text on that line after the hyphens.
-- This is a sample control file
Note
You can add comments to the command section of the file, but not to the input data section.
Question
Identify considerations relating to control-file syntax.
Options:
- A comment must be preceded by two hyphens
- The code is case-insensitive
- The code is data-manipulation language (DML)
- The code is free-format
Answer
Control-file code is free-format and case-insensitive, and a comment must be preceded by two hyphens.
Option 1 is correct. All text after the hyphens is ignored by the parser.
Option 2 is correct. As with all SQL code, you can change the case of characters without changing the meaning of the code.
Option 3 is incorrect. The code is data-description language (DDL).
Option 4 is correct. As with all SQL code, a statement can be spread out over multiple lines or can extend over one line.
The LOAD DATA
command tells the SQL*Loader that a new data load is about to start.
LOAD DATA
Note
If the control file is meant to tell the loader to continue a previous data load, you should type the CONTINUE LOAD DATA
statement on this line.
You use the INFILE
keyword to specify the name of the input data file.
Similarly, you use the BADFILE
keyword to specify the name of the bad file.
And you use the DISCARDFILE
keyword to specify the name of the discard file.
INFILE 'SAMPLE.DAT'
BADFILE 'SAMPLE.BAD'
DISCARDFILE 'SAMPLE.DSC'
If the table you want to load data into is not empty, you need to use the APPEND
keyword to specify this to the SQL*Loader.
Then you specify the name of the table to which the data should be added using an INTO TABLE
statement.
APPEND
INTO TABLE emp
Note
In the case of an empty table, you use the INSERT
keyword.
You can use the WHEN
keyword to include conditions that each record must match before it is loaded.
In this example, the WHEN
statement specifies that a record should be loaded into the table only when its fifty-seventh character is a decimal point.
WHEN (57) = '.'
The TRAILING NULLCOLS
statement tells the SQL*Loader to treat relatively positioned columns missing from a record as null columns.
TRAILING NULLCOLS
The rest of the control file comprises information about the table's column formats.
In this example, the columns are HIREDATE, DEPTNO, JOB, MGR, ENAME, EMPNO, SAL, and COMM.
(hiredate SYSDATE,
deptno POSITION(1:2) INTEGER EXTERNAL(3)
NULLIF deptno=BLANKS,
job POSITION(7:14) CHAR TERMINATED BY WHITESPACE
NULLIF job=BLANKS "UPPER(:job)",
mgr POSITION(28:31) INTEGER EXTERNAL
TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS,
ename POSITION(34:41) CHAR
TERMINATED BY WHITESPACE "UPPER(:ename)",
empno POSITION(45) INTEGER EXTERNAL
TERMINATED BY WHITESPACE,
sal POSITION(51) CHAR TERMINATED BY WHITESPACE
"TO_NUMBER(:sal,'$99,999.99')",
comm INTEGER EXTERNAL ENCLOSED BY '(' AND '%' ":comm * 100")
Supplement
Selecting the link title opens the resource in a new browser window.
View the complete control-file script.
Question
Match each statement to the line of control-file code that performs that purpose.
Options:
- Signifies that a new load is about to start
- Specifies where the input data will come from
- Provides an explanatory comment
Targets:
--
Sample control file
LOAD DATA
INFILE 'SAMPLE.DAT'
Answer
The line -- Sample control file
is an explanatory comment. The line LOAD DATA
signifies that a new load is about to start. The line INFILE 'SAMPLE.DAT'
specifies where the input data will come from.
To continue a previous load, you use the statement CONTINUE LOAD DATA
.
You can also specify the input-file format in the INFILE
statement.
The parser ignores any line that begins with two hyphens.
3. Data file formats
An input data file has one of three record formats.
- Fixed record format
- Variable record format
- Stream record format
- Fixed record format
- An input data file has fixed record format when every record in the file has the same byte length.
While this format is rather inflexible, it provides the best loading performance of all formats.
Suppose that your input data file, which is called myfile.dat, has fixed record format, with each record eight bytes in length.
This is how you specify the format in your control file.
The control-file code is as follows:
INFILE 'myfile'.dat "fix 8"
Suppose that your input data file, example.dat, contains three records, each of which contains two fields separated by a comma.
Each record contains ten bytes in total.
This control file script provides the SQL*Loader with the correct load settings for the input data file.
The script is as follows:
load data
infile 'example.dat' "fix 10"
into table example
fields terminated by ','
(col1, col2)
example.dat:
0001,abcd
0002,fghi
0003,klmn - Variable record format
- The length of each record in a variable record format file is specified at the beginning of the record.
While this provides more flexibility than fixed record format, it provides inferior loading performance.
However, its loading performance is superior to that of stream record format.
Suppose that your input data file, example.dat, is in variable record format, and the field at the beginning of the record that specifies the record's length is three bytes in length.
You need to specify the record format in the control file. Optionally, you can specify the length of the record-length field in the same statement.
If you don't specify the length of the record-length field, the default length is 5. The maximum permitted length is 40.
This is the required control-file statement.
INFILE 'example.dat' "var 3" - Stream record format
- In the case of stream record format, the SQL*Loader distinguishes between records by searching the input-data file for record terminators.
While this provides a flexible way of storing data, it provides relatively slow loading.
This is the syntax for the specification of a data file that is in stream record format:
INFILE <datafilename> ["str terminatorstring"]
The code is:
INFILE 'myfile.dat' "str 'l\n'"
If your session's character set differs from that of the input data file, character strings in the control file are changed to match the input data file's character set.
However, hexadecimal values specified in the control file are assumed to be in the input data file's character set, so they are not converted.
The default terminator string, when none is specified, is the newline character.
In this example, the terminator string is specified using a character string -'|\n'
.
The backslash character followed by n signifies that the string specifies a nonprintable line-feed character.
The script is as follows:
load data
infile 'example.dat' "str '|\n'"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1 char(5),
col2 char(7))
example.dat:
hello,world,|
james,bond,|
Question
Identify the characteristic of variable record format.
Options:
- All records are of the same byte length
- The maximum byte length of the record-length field is 40
- Records are differentiated using record terminators
Answer
In variable record format, the maximum byte length of the record-length field is 40.
Option 1 is incorrect. This is a characteristic of fixed record format.
Option 2 is correct. The default length is 5.
Option 3 is incorrect. This is a characteristic of stream record format.
Question
Identify the characteristic of stream record format.
Options:
- All records are of the same byte length
- A field at the start of each record specifies the byte length of the record
- Records are differentiated using record terminators
- SQL*Loader does not recognize this format
Answer
Stream record format records are differentiated using record terminators.
Option 1 is incorrect. This is a characteristic of fixed record format.
Option 2 is incorrect. This is a characteristic of variable record format.
Option 3 is correct. The record terminator can comprise both printable characters and nonprintable characters.
Option 4 is incorrect. The parser can recognize stream record format.
Summary
The SQL*Loader takes input data and loads it into the specified Oracle database object. It uses several files during the course of its operation: input data files, the control file, the log file, the bad file, and the discard file.
The control file specifies information such as where the input data comes from, the data storage format, the loader configuration, and how the loader should manipulate the input data.
SQL*Loader's parser recognizes three input data record formats: fixed, variable, and stream. You can specify the input data's format in the control file's INFILE
statement.
No comments:
Post a Comment