Load CSV Files Examples

Examples

Loading Data when the Order of the Columns in the Destination Table and Source File are Different

If the order of columns in the table is different from the order in the source file, you can name them explicitly. In this example, the columns are loaded in reverse order:

SQL

LOAD DATA INFILE 'foo.tsv'
  INTO TABLE foo (fourth, third, second, first);

Skipping Columns in the Source File

You can skip columns in the source file using the @ sign. In this example only the first and fourth columns are imported into table foo:

SQL

LOAD DATA INFILE 'foo.tsv'
  INTO TABLE foo (bar, @, @, baz);

Specifying the Column Delimiter

The default column delimiter is the tab (t) character, ASCII code 09. You can specify a different delimiter, even multi-character delimiters, with the COLUMNS TERMINATED BY clause:

SQL

LOAD DATA INFILE 'foo.csv'
  INTO TABLE foo
  COLUMNS TERMINATED BY ',';

In the following example, field and line delimiters are used to read a file that contains fields separated by commas and lines terminated by carriage return/newline pairs:

SQL

LOAD DATA INFILE 'foo.csv'  INTO TABLE foo  FIELDS TERMINATED BY ','  LINES TERMINATED BY '\r\n';

Source File with Unusual Column Separators

The following example demonstrates loading a file that has unusual column separators (|||):

SQL

LOAD DATA INFILE 'foo.oddformat'
  INTO TABLE foo
  COLUMNS TERMINATED BY '|||';

Loading Data from Multiple Files

Using globbing, you can load data from multiple files in a single LOAD DATA query.

The following query loads data from all the .csv files with names starting with a digit:

SQL

LOAD DATA INFILE "[0-9]*.csv"
INTO TABLE cust(ID,NAME,ORDERS);

The following query loads data from all the .csv files with filenames having four characters:

SQL

LOAD DATA INFILE "????.csv"
INTO TABLE cust(ID,NAME,ORDERS);

The following query loads data from all the .csv files with filenames not starting with a number:

SQL

LOAD DATA INFILE "[!0-9]*.csv"
INTO TABLE cust(ID,NAME,ORDERS);

Note

LOAD DATA LOCAL INFILE does not support globbing.

LOAD DATA INFILE supports globbing in filenames, but not in directory names.

CREATE PIPELINE contains a LOAD DATA clause. Here, LOAD DATA supports globbing, both in directory names and filenames.

Using the `TRAILING NULLCOLS` Clause

The following example demonstrates how to use the TRAILING NULLCOLS clause using the file numbers.csv , with the following content:

1,2,3
4,5
6

Run the following commands:

SQL

CREATE TABLE foo(a INT, b INT, c INT);

LOAD DATA INFILE 'numbers.csv' INTO TABLE foo COLUMNS TERMINATED BY ',' TRAILING NULLCOLS;

SELECT * FROM foo;

+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 |
|    4 |    5 | NULL |
|    6 | NULL | NULL |
+------+------+------+

Using the `NULL DEFINED BY` Clause

The following example demonstrates how to use the NULL DEFINED BY clause using the data.csv file.

cat data.csv

DTB,'',25
SPD,,40

SQL

SELECT * FROM stockN;

+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
+------+-------------+-------+

The following query inserts the un-enclosed empty field as a NULL value and the enclosed empty field as an empty string.

SQL

LOAD DATA INFILE '/data.csv'
INTO TABLE stockN
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "'"
NULL DEFINED BY '';

SELECT * FROM stockN;

+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
| DTB  |             |    25 |
| SPD  | NULL        |    40 |
+------+-------------+-------+

If you add the OPTIONALLY ENCLOSED option to the NULL DEFINED BY clause in the query above, and run the following query instead, both the empty fields are inserted as a NULL value:

SQL

LOAD DATA INFILE '/data.csv'
INTO TABLE stockN
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "'"
NULL DEFINED BY '' OPTIONALLY ENCLOSED;

SELECT * FROM stockN;

+------+-------------+-------+
| ID   | City        | Count |
+------+-------------+-------+
| XCN  | new york    |    45 |
| ZDF  | washington  |    20 |
| XCN  | chicago     |    32 |
| DTB  | NULL        |    25 |
| SPD  | NULL        |    40 |
+------+-------------+-------+

Using the `IGNORE LINES` Clause

In the following example, the IGNORE LINES clause is used to skip the header line that contains column names in the source file:

SQL

LOAD DATA INFILE '/tmp/data.txt' INTO City IGNORE 1 LINES;

Using the `ESCAPED BY` Clause

The following example demonstrates how to load data into the loadEsc table using the ESCAPED BY clause from the file contacts.csv, whose contents are shown below.

GALE\, ADAM, Brooklyn
FLETCHER\, RON, New York
WAKEFIELD\, CLARA, DC

SQL

DESC loadEsc;

+-------+-------------+------+------+---------+-------+
| Field | Type        | Null | Key  | Default | Extra |
+-------+-------------+------+------+---------+-------+
| Name  | varchar(40) | YES  |      | NULL    |       |
| City  | varchar(40) | YES  |      | NULL    |       |
+-------+-------------+------+------+---------+-------+

Execute the following query:

SQL

LOAD DATA INFILE '/contacts.csv'
INTO TABLE loadEsc COLUMNS TERMINATED BY ',' ESCAPED BY '\\' ;
SELECT * FROM loadEsc;

+-------------------+-----------+
| Name              | City      |
+-------------------+-----------+
| GALE, ADAM        |  Brooklyn |
| FLETCHER, RON     |  New York |
| WAKEFIELD, CLARA  |  DC       |
+-------------------+-----------+

In this query, the \ character escapes the comma (,) between the first two fields of the contacts.csv file. (The \ (backslash) is the default escape character in a SQL query. Hence, the \\ (double backslash) is used escape the backslash itself inside the query.)

Warning

If you (accidentally) escape the TERMINATED BY character in a file, the SQL query may return an error. For example, if you escape both the commas in any row of the contacts.csv file mentioned above, as:

GALE\, ADAM\, Brooklyn
FLETCHER\, RON, New York
WAKEFIELD\, CLARA, DC

and then execute the following query

SQL

LOAD DATA INFILE '/contacts.csv'
INTO TABLE loadEsc COLUMNS TERMINATED BY ',' ESCAPED BY '\\' ;

it returns the following error: ERROR 1261 (01000): Row 1 does not contain data for all columns. Because, the \ (backslash) escapes both the commas and LOAD DATA perceives the first row as a single column.

Using the `STARTING BY` Clause

The following example demonstrates how to skip the prefix ### in the stockUpd.txt data file using the STARTING BY clause.

cat stockUpd.txt

###1,"xcg",
3,"dfg"
new product###4,"rfk",5

SQL

LOAD DATA INFILE 'stockUpd.txt'
INTO TABLE stock
  FIELDS TERMINATED BY ','
  LINES STARTING BY '###';

SELECT * FROM stock;

+----+------+----------+
| ID | Code | Quantity |
+----+------+----------+
|  1 |  xcg |       10 |
|  4 |  rfk |        5 |
+----+------+----------+

In this example, the STARTING BY clause skips the prefix ### in the first and third lines and anything before it. It skips the second line, because it does not contain ###.

Filtering out Rows from the Source File

You can also filter out unwanted rows using the WHERE clause. In this example, only rows where bar is equal to 5 will be loaded. All other rows will be discarded:

SQL

LOAD DATA INFILE 'foo.oddformat'
  INTO TABLE foo (bar, baz)
  WHERE bar = 5;

Filtering out and Transforming Rows From the Source File

Complex transformations can be performed in both the SET and WHERE clauses. For example, if you have an input file with a EventDate field and an EventId field:

10-1-2016,1
4-15-2016,2
1-10-2017,3
4-10-2017,4

You want to only load the rows with a date that is within three months from a certain date, 10/15/2016, for instance. This can be accomplished by the following:

SQL

CREATE TABLE foo (EventDate date, EventId int);

LOAD DATA INFILE 'date_event.csv'
  INTO TABLE foo
  FIELDS TERMINATED BY ','
  (@EventDate, EventId)
  SET EventDate = STR_TO_DATE(@EventDate, '%m-%d-%Y')
  WHERE ABS(MONTHS_BETWEEN(EventDate, date('2016-10-15'))) < 3;

SELECT * FROM t;

+------------+---------+
| EventDate  | EventId |
+------------+---------+
| 2016-10-01 |       1 |
| 2017-01-10 |       3 |
+------------+---------+

While both column names and variables can be referenced in the WHERE clause column names can only be assigned to in the SET clause. The scope of these clauses is restricted to the current row and therefore SELECT statements cannot be evaluated.

Using `REPLACE`

This example uses the cust table, which is defined as a columnstore table as follows:

SQL

CREATE TABLE cust(name VARCHAR(32), id INT(11), orders INT(11), SORT KEY(id), UNIQUE KEY(id) USING HASH, SHARD KEY(id));

Assume the directory /order_files has one file orders.csv, which contains the following data:

Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9

Create a LOAD DATA statement with a REPLACE clause:

SQL

LOAD DATA INFILE '/order_files/orders.csv' REPLACE INTO TABLE cust FIELDS TERMINATED BY ',';

As LOAD DATA ingests the data from orders.csv into the cust table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301 and 3412. The second and third records containing those duplicate keys (which have already been imported into cust), are replaced with the fifth and second records.

SQL

SELECT * FROM cust ORDER BY name;

+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Chris  | 7214 |      6 |
| George | 3412 |      9 |
| Rachel | 9125 |      2 |
| Susan  | 8301 |      7 |
+--------+------+--------+

Note

If you want to see more examples of loading data with vectors, refer to How to Bulk Load Vectors.

Loading a Fixed Length File

This example demonstrates how to load the contents of the file fixed_length.csv, whose contents are shown below.

APE602020-06-01
TR 252019-08-07
HSW8 2019-10-11
YTR122020-09-02

LOAD DATA inserts each extracted row from fixed_length.csv into the table foo. Define the table as follows:

SQL

CREATE TABLE foo(a CHAR(3), b INT, c DATETIME);

Run the LOAD DATA statement:

SQL

LOAD DATA INFILE '/fixed_length.csv'
  INTO TABLE foo (@current_row)
  SET a = TRIM(SUBSTR(@current_row,1,3)),
      b = TRIM(SUBSTR(@current_row,4,2)),
      c = TRIM(SUBSTR(@current_row,6,10));

SUBSTR() extracts a substring from a string and TRIM() removes the padding (spaces in this case) from the beginning and the ending of a string. For example, after the LOAD DATA statement extracts the line HSW8 2019-10-11 in fixed_length.csv, it does the following to set b: * It extracts, from HSW8 2019-10-11, the substring starting at position 4 having a length of 2. The resulting substring is 8. * It removes the leading whitespace from 8 to yield 8.

Retrieve the data from foo:

SQL

SELECT * from foo ORDER BY a;

+------+------+---------------------+
| a    | b    | c                   |
+------+------+---------------------+
| APE  |   60 | 2020-06-01 00:00:00 |
| HSW  |    8 | 2019-10-11 00:00:00 |
| TR   |   25 | 2019-08-07 00:00:00 |
| YTR  |   12 | 2020-09-02 00:00:00 |
+------+------+---------------------+

Loading Data using Hex Field Terminator Syntax

Loading data into a table can be performed using a hexadecimal field terminator.

Syntax

SQL

CREATE TABLE <table name>(a int, b int);

LOAD DATA infile "/tmp/<file name>.csv" 
INTO TABLE test (a, b) fields terminated by 0x2c;

SELECT * FROM <table name>;

**
+------+------+
|  a   | b    |
+------+------+
|    1 |    2 |
+------+------+

Contents of the hex_test.csv was created via the CLI using the following commands:

echo 1. 2 > hex_test.csv

hexdump -C hex_test.csv

00000000  31 2e 20 32 0a                            |1. 2.|
00000005

Load CSV Files Examples

On this page

Examples

Loading Data when the Order of the Columns in the Destination Table and Source File are Different

Skipping Columns in the Source File

Specifying the Column Delimiter

Loading Data from Multiple Files

Using the `TRAILING NULLCOLS` Clause

Using the `NULL DEFINED BY` Clause

Using the `IGNORE LINES` Clause

Using the `ESCAPED BY` Clause

Using the `STARTING BY` Clause

Filtering out Rows from the Source File

Filtering out and Transforming Rows From the Source File

Using `REPLACE`

Loading a Fixed Length File

Loading Data using Hex Field Terminator Syntax

Syntax

Was this article helpful?

On this page

Was this article helpful?

Load CSV Files Examples

On this page

Examples

Loading Data when the Order of the Columns in the Destination Table and Source File are Different

Skipping Columns in the Source File

Specifying the Column Delimiter

Loading Data from Multiple Files

Using the TRAILING NULLCOLS Clause

Using the NULL DEFINED BY Clause

Using the IGNORE LINES Clause

Using the ESCAPED BY Clause

Using the STARTING BY Clause

Filtering out Rows from the Source File

Filtering out and Transforming Rows From the Source File

Using REPLACE

Loading a Fixed Length File

Loading Data using Hex Field Terminator Syntax

Syntax

Was this article helpful?

On this page

Was this article helpful?

Using the `TRAILING NULLCOLS` Clause

Using the `NULL DEFINED BY` Clause

Using the `IGNORE LINES` Clause

Using the `ESCAPED BY` Clause

Using the `STARTING BY` Clause

Using `REPLACE`