LOAD DATA
On this page
Important
SingleStore only supports LOAD DATA
with the LOCAL
option.LOAD DATA LOCAL
must be run from a SQL client running on a computer that can access your SingleStore instance, such as the MySQL client (mysql-client) or SingleStore client.
Important
SingleStore clusters can be integrated with many third-party ETL and CDC tools.
Import data stored in a CSV, JSON, or Avro file into a SingleStore table (referred to as the destination table in this topic).
Remarks
The syntax and semantics of loading data from a CSV, JSON, or Avro file are detailed below.
REPLACE
, SKIP CONSTRAINT ERRORS
, and SKIP DUPLICATE KEY ERRORS
are supported with non-CSV pipelines.
During the import of data stored in any of these files, you can optionally apply operations to the data as follows:
-
Use the
WHERE
clause to filter on incoming data.Only rows that satisfy the expression in the WHERE
clause will be loaded into SingleStore.For an example of how to use the WHERE
clause, see the examples section. -
Use the
SET
clause to set columns using specific values or expressions with variables.For example, if your input file has 9 columns but the table has a 10th column called foo
, you can addSET foo=0
orSET foo=@myVariable
.Note that column names may only be used on the left side of SET
expressions. -
Use the
CHARACTER SET
clause to import files with any supported character set into SingleStore.For more information, see Character Encoding.
Refer to the Permission Matrix for the required permission.
Important
If a query uses @
in a LOAD DATA
statement, SingleStore interprets it as a reference to a LOAD DATA
assignment to a variable, not as a reference to a user-defined variable.
The behavior of SingleStore’s LOAD DATA
command has several functional differences from MySQL’s command:
-
LOAD DATA
will load the data into SingleStore in parallel to maximize performance.This makes LOAD DATA
in SingleStore much faster on machines with a larger number of processors. -
LOAD DATA
supports loading compressed .gz files. -
The only supported
charset_
isname utf8
. -
While it is possible to execute
LOAD DATA
on leaf nodes, it should only be run on master aggregator or child aggregator node types.See Cluster Management Commands for more information. Note that when running this command on reference tables you must connect to the master aggregator.
The mysqlimport
utility can also be used to import data into SingleStore.mysqlimport
uses LOAD DATA
internally.
The LOAD DATA
operation does not fail when it encounters an error, instead it stores the error and continues ingesting data.LOAD DATA
operation, but the number of errors is limited to 1000 for each partition by default.
For example, if a table has 16 partitions, the total could be up to 16,000 errors (1000 per partition) before the LOAD DATA
operation fails.
This prevents out-of-memory issues when unintentionally loading large files with incorrect formatting or an incorrect LOAD DATA
statement.MAX_
at the end of the statement to change the limit on the number of errors (per partition) before LOAD DATA
fails.LOAD DATA
should fail when the first error is encountered, set MAX_
to 0.
Writing to multiple databases in a transaction is not supported.
CSV LOAD DATA
Syntax
LOAD DATA [LOCAL] INFILE '<file_name>'[REPLACE | IGNORE | SKIP { ALL | CONSTRAINT | DUPLICATE KEY | PARSER } ERRORS]INTO TABLE <table_name>[CHARACTER SET <character_set_name>][{FIELDS | COLUMNS}[TERMINATED BY '<string>'][[OPTIONALLY] ENCLOSED BY '<char>'][ESCAPED BY '<char>']][LINES[STARTING BY '<string>'][TERMINATED BY '<string>']][TRAILING NULLCOLS][NULL DEFINED BY <string> [OPTIONALLY ENCLOSED]][IGNORE <number> LINES][ ({<column_name> | @<variable_name>}, ...) ][SET <column_name> = <expression>,...][WHERE <expression>,...][MAX_ERRORS <number>][ERRORS HANDLE <string>]
JSON LOAD DATA
Syntax
LOAD DATA [LOCAL] INFILE 'file_name'[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]INTO TABLE tbl_nameFORMAT JSONsubvalue_mapping[SET col_name = expr,...][WHERE expr,...][MAX_ERRORS number][ERRORS HANDLE string]subvalue_mapping:( {col_name | @variable_name} <- subvalue_path [DEFAULT literal_expr], ...)subvalue_path:{% | [%::]ident [::ident ...]}
BSON LOAD DATA
The LOAD DATA
command supports loading BSON data from files using the FORMAT BSON
clause.LOAD DATA .
SQL statement is similar to LOAD DATA .
with the following exceptions:
-
The
FORMAT BSON
clause does not support default values. -
The
subvalue_
clause must be specified in themapping LOAD DATA .
SQL statement.. . FORMAT BSON -
The target columns in the
subvalue_
clause must bemapping BSON
type columns.If the target columns are non- BSON
type, they must be mapped to a user-defined variable and then assigned to the column using theSET
clause.
Refer to JSON LOAD DATA for more information.
Syntax
LOAD DATA [LOCAL] INFILE 'file_name'
[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
INTO TABLE tbl_name
FORMAT BSON
subvalue_mapping
[SET col_name = expr,...]
[WHERE expr,...]
[MAX_ERRORS number]
[ERRORS HANDLE string]
subvalue_mapping:
( {col_name | @variable_name} <- subvalue_path, ...)
subvalue_path:
{% | [%::]ident [::ident ...]}
Loading BSON Data from a File
The following example restores a MongoDB® backup into SingleStore.
This example uses the following sample data set.
use dbmdb.bsonExport.insertMany( [{ _id: 1, Code: "xv1f", Qty: 45 },{ _id: 2, Code: "nm3w", Qty: 30 },{ _id: 3, Code: "qoma", Qty: 20 },{ _id: 4, Code: "hr3k", Qty: 15 } ] )
{ acknowledged: true,
insertedIds: { '0': 1, '1': 2, '2': 3, '3': 4 } }
Create a binary export of the MongoDB® data using the mongodump
tool:
mongodump --uri="mongodb://<username>:<password>@<mongodb-endpoint>:27017/?authMechanism=PLAIN&tls=true&loadBalanced=true" --db="dbm" --collection="bsonExport" --out="<path_to_output_directory>"
This command creates a bsonExport.
file in the target output directory.
Create a table in your SingleStore database to store the BSON data:
CREATE TABLE bsonExport (_id BSON NOT NULL,_more BSON NOT NULL COMMENT 'KAI_MORE',`$_id` AS BSON_NORMALIZE_NO_ARRAY(`_id`) PERSISTED LONGBLOB COMMENT 'KAI_AUTO',SHARD KEY (`$_id`), PRIMARY KEY (`$_id`));
Load the bsonExport.
file into SingleStore using the following command:
LOAD DATA INFILE '<path_to_output_directory>/bsonExport.bson'INTO TABLE bsonEx FORMAT BSON (_id <- %::_id, @V1 <- %)SET _more = BSON_EXCLUDE_MASK(@V1,'{"_id":1}');
The BSON data has been ingested and is now stored in your SingleStore database.
SELECT _id:>JSON AS "_id", _more:>JSON AS "_more" FROM bsonEx;
+------+--------------------------+
| _id | _more |
+------+--------------------------+
| 4 | {"Code":"hr3k","Qty":15} |
| 3 | {"Code":"qoma","Qty":20} |
| 2 | {"Code":"nm3w","Qty":30} |
| 1 | {"Code":"xv1f","Qty":45} |
+------+--------------------------+
Avro LOAD DATA
Syntax for LOAD DATA Local Infile
LOAD DATA [LOCAL] INFILE 'file_name'WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]INTO TABLE tbl_nameFORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}subvalue_mapping[SET col_name = expr,...][WHERE expr,...][MAX_ERRORS number][ERRORS HANDLE string][SCHEMA 'avro_schema']subvalue_mapping:( {col_name | @variable_name} <- subvalue_path, ...)subvalue_path:{% | [%::]ident [::ident ...]}
See the associated GitHub repo.
Loading Parquet Data
The LOAD DATA
command supports loading Parquet files from AWS S3 or local files.LOAD DATA
clause in a CREATE PIPELINE .
Syntax for LOAD DATA AWS S3 or Local File Source
Parquet-formatted data stored in an AWS S3 bucket or the local filesystem can be loaded via a LOAD DATA query without a pipeline.LOAD DATA
clauses (SET
, WHERE
, etc.
For S3:
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT PARQUET;
This data can also be loaded from S3 by using a connection link.
LOAD DATA LINK <link_name> '<bucket name>/<path>'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT PARQUET;
For local file:
LOAD DATA INFILE '<path_to_file/file_name>'INTO TABLE <table_name>(val1 <- source1,val2 <- source2[ ... ]) [COMPRESSION { AUTO | NONE | LZ4 | GZIP }][ ... ]FORMAT PARQUET;
Handling Data Compression
The COMPRESSION
clause specifies how LOAD DATA
handles the compression of an input file.
Syntax for LOAD DATA Local Infile
LOAD DATA INFILE 'filename' COMPRESSION { AUTO | NONE | LZ4 | GZIP } INTO TABLE ...LOAD DATA INFILE 'filename' INTO TABLE `tablename` COMPRESSION { AUTO | NONE | LZ4 | GZIP } ...
Arguments
-
AUTO
: This is the default setting, it tellsLOAD DATA
to identify the compression type from the input file’s extension. -
NONE
: Specifies that the input file is uncompressed. -
LZ4
: Specifies that the input file is compressed withLZ4
compression algorithm. -
GZIP
: Specifies that the input file is compressed withGZIP
compression algorithm.
Remarks
-
If
COMPRESSION
is set toNONE
,LZ4
, orGZIP
,LOAD DATA
will not use the extension of the input file to determine the type of compression.For example, if you load a file test.
and specify thegz COMPRESSION
asNONE
, thenLOAD DATA
will handletest.
as an uncompressed file.gz
LOCAL
LOCAL
affects the expected file location, the search behavior for relative path names, and Error Handling behavior.
When you specify LOCAL
, the client reads file_
and sends it to the server.file_
is a relative path, it is relative to the current working directory of the client.
When LOCAL
is not specified, the file is read by the server, and needs to be located on the related server host.
Because files need to be sent from the client to the server, specifying LOCAL
can be slower.LOCAL
is not specified, the server needs access to the full data directory, meaning that any user who has the permissions to LOAD DATA
or CREATE PIPELINE
can read the directory.FILE READ
Permissions Matrix
LOCAL
does not support globbing (such as using wildcards in directory or filenames).
An example of using LOCAL
follows:
LOAD DATA LOCAL INFILE '/example-directory/foo.csv'INTO TABLE fooCOLUMNS TERMINATED BY ',';
Error Logging
When you run the LOAD DATA
command and use the ERRORS HANDLE
clause, LOAD DATA
logs errors to the information_
table.LOAD DATA
encountered as it processed the input file.
See the next section for example data that LOAD DATA .
populates in the information_
table.
Use the CLEAR LOAD ERRORS command to remove errors from information_
.
Error Handling
LOAD DATA
has several options to handle errors that it encounters as it processes the input file.LOAD DATA
statement, you can decide which option to use.
-
By default,
LOAD DATA
returns errors to the client application.Errors are returned one at a time. -
To ignore duplicate key/index value errors in the input file, use the
REPLACE
clause to replace existing rows with input rows.This clause first deletes the existing rows that have the same value for a primary key or unique index as the input rows, and then inserts the new row. -
To skip errors in the input file, use the
SKIP .
clause.. . ERRORS Data in the erroneous lines will not be inserted into the destination table. -
To ignore errors in the input file, use the
IGNORE
clause.This clause replaces invalid values with their defaults, discards extra fields, or discards erroneous lines completely. -
When
LOCAL
is specified, duplicate-key and data interpretation errors do not stop the operation.When LOCAL
is not specified, duplicate-key and data interpretation stop the operation.
Warning
In most cases, use SKIP .
instead of IGNORE
.IGNORE
without understanding how it behaves, LOAD DATA
may produce unexpected results as it inserts data into the destination table.
The four error handling options are discussed in the following topics.
Default Error Handling
By default, LOAD DATA
returns errors to the client application.
Error Handling Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time TIMESTAMP NOT NULL);
The following CSV file will loaded be into this table as orders.
.2
in line 4.
1,372,Apples,2016-05-09
3,307,Oranges,2016-07-31,1000
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
Load the data into the table:
LOAD DATA INFILE 'orders.csv'INTO TABLE ordersFIELDS TERMINATED BY ',';
ERROR 1262 (01000): Row 2 was truncated; it contained more data than there were input columns
After removing the extra column from row 2:
ERROR 1062 (23000): Leaf Error (127.0.0.1:3308): Duplicate entry '2' for key 'PRIMARY'
After removing the duplicate primary key entry, the LOAD DATA
statement is successful and the input file is loaded into the table.
REPLACE
Error Handling
SingleStore’s REPLACE
behavior allows you to replace the existing rows with the new rows; only those rows that have the same value for a primary key or unique index as the input rows are replaced.
LOAD DATA
inserts source file rows into the destination table in the order in which the rows appear in the source file.REPLACE
is specified, source files that contain duplicate unique or primary key values will be handled in the following way:
-
If the destination table’s schema specifies a unique or primary key column, and
-
The source file contains a row with the same primary or unique key value as the destination table, then
-
The row in the destination table that has the same unique or primary key value as the row in the source file will be deleted and a new row from the source file that matches the primary key value will be inserted into the destination table.
Note: If the source file contains multiple rows with the same primary or unique key value as the destination table, then only the last row in the source file with the same primary or unique key value (as the destination table) replaces the existing row in the destination table.
Note: REPLACE
cannot be combined with SKIP DUPLICATE KEY ERRORS
.REPLACE
and SKIP DUPLICATE KEY ERRORS
does not throw a duplicate key error; REPLACE
replaces the old row with the new row, while SKIP DUPLICATE KEY ERRORS
discards the new row and retains the old row.
REPLACE
Error Handling Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time DATETIME NOT NULL);
A row with a primary key 4
is inserted as follows:
INSERT INTO orders VALUES(4,236,"Bananas",2016-06-23);
The following CSV file is loaded into the table as orders.
.4
in line 2:
1,372,Apples,2016-05-09
4,138,Pears,2016-07-14
3,307,Oranges,2016-07-31
Load the data into the table:
LOAD DATA INFILE 'orders.csv'REPLACEINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Line 2 in the source file contained a duplicate primary key 4
.REPLACE
error handler deletes the row 4,236,"Bananas",2016-06-23
in the destination table and replaces it with the value 4,138,Pears,2016-07-14
from the source file.
SKIP . . . ERRORS
Error Handling
SingleStore’s SKIP .
behavior allows you to specify an error scenario that, when encountered, discards an offending row.
-
SKIP DUPLICATE KEY ERRORS
: Any row in the source data that contains a duplicate unique or primary key will be discarded.If the row contains invalid data other than a duplicate key, an error will be generated. See SKIP DUPLICATE KEY ERRORS below. -
SKIP CONSTRAINT ERRORS
: Inclusive ofSKIP DUPLICATE KEY ERRORS
.If a row violates a column’s NOT NULL
constraint, or the row contains invalid JSON or Geospatial values, the row will be discarded.If the row contains invalid data outside the scope of constraint or invalid value errors, an error will be generated. See SKIP CONSTRAINT ERRORS below. -
SKIP ALL ERRORS
: Inclusive ofSKIP DUPLICATE KEY ERRORS
andSKIP CONSTRAINT ERRORS
.Also includes any parsing errors in the row caused by issues such as an invalid number of fields. See SKIP ALL ERRORS below.
SKIP DUPLICATE KEY ERRORS
When SKIP DUPLICATE KEY ERRORS
is specified, source files that contain duplicate unique or primary key values will be handled in the following way:
-
If the destination table’s schema specifies a unique or primary key column, and
-
The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then
-
Every duplicate row in the source file will be discarded and will not be inserted into the destination table.
SKIP DUPLICATE KEY ERRORS
cannot be combined with REPLACE
.
SKIP DUPLICATE KEY ERRORS Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time TIMESTAMP NOT NULL);
The following CSV file will loaded be into this table as orders.
.2
in line 3:
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31
Load the data into the table:
LOAD DATA INFILE 'orders.csv'SKIP DUPLICATE KEY ERRORSINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Note that only 3 rows were inserted even though 4 rows were present in the source file.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+---------------------------+--------------------------------+
| 3 | 2,236,Bananas,2016-06-23 | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+
SKIP CONSTRAINT ERRORS
SKIP CONSTRAINT ERRORS
is inclusive of SKIP DUPLICATE KEY ERRORS
if REPLACE
is not specified.NOT NULL
constraint and fields that contain invalid JSON or Geospatial values, and handles the offending rows in the following ways:
NOT NULL Constraint
-
If a column in the destination table specifies a
NOT NULL
constraint, and -
The source file contains one or more rows with a null value for the constraint column, then
-
The offending row(s) will be discarded and will not be inserted into the destination table.
Invalid JSON or Geospatial Data
-
If a column in the destination table specifies a
JSON
,GEOGRAPHYPOINT
, orGEOGRAPHY
data type, and -
The source file contains one or more rows with invalid values for fields of these types, then
-
The offending row(s) will be discarded and will not be inserted into the destination table.
SKIP CONSTRAINT ERRORS
can also be combined with the REPLACE
clause.
SKIP CONSTRAINT ERRORS Example
Create a new table with a JSON
column type that also has a NOT NULL
constraint:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_properties JSON NOT NULL);
The following CSV file will loaded be into this table as orders.
.\N
) for JSON in line 4:
1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears,{"order-date"}
3,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,\N
Load the data into the table:
LOAD DATA INFILE 'orders.csv'SKIP CONSTRAINT ERRORSINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Note that only 2 rows were inserted even though 4 rows were present in the source file.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+-----------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+-----------------------------+--------------------------------------------------------------+
| 2 | 2,138,Pears,{"order-date"} | Invalid JSON value for column 'order_properties' |
| 4 | 4,307,Oranges,\N | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+-----------------------------+--------------------------------------------------------------+
SKIP ALL ERRORS
SKIP ALL ERRORS
is inclusive of SKIP DUPLICATE KEY ERRORS
and SKIP CONSTRAINT ERRORS
in addition to any parsing error.
-
If one or more rows in the source file cause
.
or. . DUPLICATE KEY . . . .
errors, or. . CONSTRAINT . . . -
If one or more rows in the source file cause parsing errors such as invalid delimiters or an invalid number of fields,
-
The offending row(s) will be discarded and will not be inserted into the destination table.
SKIP ALL ERRORS
can also be combined with REPLACE
.
SKIP ALL ERRORS Example
Create a new table with a JSON
column type that also has a NOT NULL
constraint:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_properties JSON NOT NULL);
The following CSV file will loaded be into this table as orders.
.
-
Line 2 contains only 3 fields
-
Line 3 has a duplicate primary key
-
Line 4 has a null value for a
NOT NULL
constraint
1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears
1,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,\N
Load the data into the table:
LOAD DATA INFILE 'orders.csv'SKIP ALL ERRORSINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Only 1 row was written, despite the source file containing 4 rows.NOT NULL
constraint.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
| 2 | 2,138,Pears | Row 2 doesn't contain data for all columns |
| 3 | 1,236,Bananas,{"order-date":"2016-06-23"}. | Duplicate entry for unique key |
| 4 | 4,307,Oranges,\N | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
IGNORE
Error Handling
SingleStore’s IGNORE
behavior is identical to MySQL’s IGNORE
behavior, and exists only to support backwards compatibility with applications written for MySQL.IGNORE
either discards malformed rows, discards extra fields, or replaces invalid values with default data type values.IGNORE
was not specified, it will be converted to a warning instead.
Consequences of Using IGNORE Instead of SKIP ERRORS
Unlike SKIP .
which discards offending rows, IGNORE
may change the inserted row’s data to ensure that it adheres to the table schema.
In a best case scenario where a malformed row uses the proper delimiters and contains the correct number of fields, the row can be partially salvaged.
However, the worst case scenario can be severe.
Due to the potential consequences of using IGNORE
, in most cases SKIP .
is a better option.IGNORE
’s behavior for each error scenario, continue reading the sections below:
Duplicate Unique or Primary Key Values
When IGNORE
is specified, source files that contain duplicate unique or primary key values will be handled in the following way:
-
If the destination table’s schema specifies a unique or primary key column, and
-
The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then
-
Every duplicate row in the source file will be discarded (ignored) and will not be inserted into the destination table.
Duplicate Unique or Primary Key Values Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time DATETIME NOT NULL);
The following CSV file will loaded be into this table as orders.
.2
in line 3:
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31
Load the data into the table:
LOAD DATA INFILE 'orders.csv'IGNOREINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Note that only 3 rows were inserted even though 4 rows were present in the source file.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+---------------------------+--------------------------------+
| 3 | 2,236,Bananas,2016-06-23 | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+
Line 3 in the source file contained a duplicate primary key and was discarded because line 2 was inserted first.
Values with Invalid Types According to the Destination Table’s Schema
When IGNORE
is specified, source files that contain rows with invalid types that violate the destination table’s schema will be handled in the following way:
-
If the source file contains one or more rows with values that do not adhere to the destination table’s schema,
-
Each value of an invalid type in a row will be replaced with the default value of the appropriate type, and
-
The modified row(s) will be inserted into the destination table.
IGNORE
behaves in a potentially unexpected way for columns that have a DEFAULT
value specified.DEFAULT
value is ignored.
Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time DATETIME NOT NULL);
The following CSV file will be loaded be into this table as orders.
.NULL
value for order_
, whereas the table schema does not allow NULL
values for this field.
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
3,236,Bananas,2016-06-23
4,307,Oranges,\N
Load the data into the table:
LOAD DATA INFILE 'orders.csv'IGNOREINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Note that 4 rows were inserted despite the fact that line 4 in the source file contained a null value for a NOT NULL
column.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+------------------+--------------------------------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+------------------+--------------------------------------------------------+
| 4 | 4,307,Oranges,\N | NULL supplied to NOT NULL column 'order_time' at row 4 |
+-----------------------+------------------+--------------------------------------------------------+
To see what was inserted by replacing the invalid DATETIME
value with a default value, query the table:
SELECT * FROM orders ORDER BY 1;
+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time |
+----+-------------+------------------+---------------------+
| 1 | 372 | Apples | 2016-05-09 00:00:00 |
| 2 | 138 | Pears | 2016-07-14 00:00:00 |
| 3 | 236 | Bananas | 2016-06-23 00:00:00 |
| 4 | 307 | Oranges | 0000-00-00 00:00:00 |
+----+-------------+------------------+---------------------+
In this example, the invalid null DATETIME
value was replaced with its default value: 0000-00-00 00:00:00
.
Rows That Contain an Invalid Number of Fields
When IGNORE
is specified, source files that contain rows with an invalid number of fields will be handled in one of two ways:
Too Few Fields
-
If the source file contains one or more rows with too few fields according to the destination table’s schema,
-
Each row’s empty field(s) will be updated with default values, and
-
The row will be inserted into the destination table.
Too Many Fields
-
If the source file contains one or more rows with too many fields according to the destination table’s schema,
-
Each extra field in the row(s) will be discarded (ignored), and
-
The row will be inserted into the destination table.
Example
Create a new table with a PRIMARY KEY
column:
CREATE TABLE orders(id BIGINT PRIMARY KEY,customer_id INT,item_description VARCHAR(255),order_time DATETIME NOT NULL);
The following CSV file will loaded be into this table as orders.
.
-
Line 2 contains only 3 fields instead of 4 and does not have a
TIMESTAMP
: -
Line 4 contains an extra field, for a total of 5
1,372,Apples,2016-05-09
2,138,Pears
3,236,Bananas,2016-06-23
4,307,Oranges,2016-07-31,Berries
Load the data into the table:
LOAD DATA INFILE 'orders.csv'IGNOREINTO TABLE ordersFIELDS TERMINATED BY ','ERRORS HANDLE 'orders_errors';
Note that 4 rows were inserted despite the invalid number of fields for two of the rows.INFORMATION_
table:
SELECT load_data_line_number, load_data_line, error_messageFROM INFORMATION_SCHEMA.LOAD_DATA_ERRORSWHERE handle = 'orders_errors'ORDER BY load_data_line_number;
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
| load_data_line_number | load_data_line | error_message |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
| 2 | 2,138,Pears | Row 2 doesn't contain data for all columns |
| 4 | 4,307,Oranges,2016-07-31,Berries | Row 4 was truncated; it contained more data than there were input columns |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
Note that there is a warning for the missing value in row 2 and the extra value in row 4.
SELECT * FROM orders ORDER BY 1;
+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time |
+----+-------------+------------------+---------------------+
| 1 | 372 | Apples | 2016-05-09 00:00:00 |
| 2 | 138 | Pears | 0000-00-00 00:00:00 |
| 3 | 236 | Bananas | 2016-06-23 00:00:00 |
| 4 | 307 | Oranges | 2016-07-31 00:00:00 |
+----+-------------+------------------+---------------------+
Line 2 did not have a DATETIME
value, so the default value for its type was inserted instead.
Performance Considerations
Shard Keys
Loading data into a table with a shard key requires reading the necessary columns on the aggregator to compute the shard key before sending data to the leaves.
Keyless Sharding
Loading data into a keylessly sharded table (no shard key is declared, or shard()
is specified) will result in batches of data loaded into different partitions, in a round-robin fashion.
Retrieve loading status
The information_
table reports information about rows and bytes read by in-progress LOAD DATA
queries.
It also reports activity and database names, which you can use to find corresponding rows in workload profiling tables.
Important
Result sets will only be returned if LMV_
is queried on the same aggregator as the in-progress LOAD_
queries.
information_ schema. LMV_ LOAD_ DATA_ STATUS Table Schema
Column Name |
Description |
---|---|
|
The connection ID. |
|
The name of the database activity. |
|
The name of the database associated with the file being loaded into the cluster. |
|
Bytes read from the input file stream. |
|
A count of rows read in from the source file (including skipped rows). |
SELECT * FROM information_schema.LMV_LOAD_DATA_STATUS;
+------+------------------------------------+---------------+------------+-----------+
| ID | ACTIVITY_NAME | DATABASE_NAME | BYTES_READ | ROWS_READ |
+------+------------------------------------+---------------+------------+-----------+
| 2351 | load_data_company_0e8dec6d07d9cba5 | trades | 94380647 | 700512 |
+------+------------------------------------+---------------+------------+-----------+
Related Topics
Last modified: March 26, 2025