Load Avro Files with LOAD DATA
On this page
Data from AVRO files can be loaded using the LOAD DATA command.
Avro LOAD DATA
Syntax for LOAD DATA Local Infile
LOAD DATA [LOCAL] INFILE 'file_name'WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]INTO TABLE tbl_nameFORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}subvalue_mapping[SET col_name = expr,...][WHERE expr,...][MAX_ERRORS number][ERRORS HANDLE string][SCHEMA 'avro_schema']subvalue_mapping:( {col_name | @variable_name} <- subvalue_path, ...)subvalue_path:{% | [%::]ident [::ident ...]}
See the associated GitHub repo.
Syntax for LOAD DATA AWS S3 Source
Avro-formatted data stored in an AWS S3 bucket can use a LOAD DATA query without a pipeline.
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT AVRO;
This data can also be loaded from S3 with a connection link.
LOAD DATA LINK <link_name> '<bucket name>'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT AVRO;
Semantics
Error Logging and Error Handling are discussed in the LOAD DATA page.
LOAD DATA
for Avro does not support file name globbing (for example: LOAD DATA INFILE '/data/nfs/gp1/*.
).LOAD DATA
for Avro only supports loading a single file per statement.
Extract specified subvalues from each Avro value in file_
.tbl_
, or to variables used for a column assignment in a SET
clause.WHERE
clause.
To specify the compression type of an input file, use the COMPRESSION
clause.
Avro LOAD DATA
expects Avro data in one of two sub-formats
, depending on the SCHEMA
clause.
If no SCHEMA
clause is provided, file_
must name an Avro Object Container File as described in version 1.
-
The compression codec of the file must be
null
. -
Array and map values must not have more than 16384 elements.
-
The type name of a
record
must not be used in a symbolicreference to previously defined name
in any of its fields.It may still be used in a symbolic reference outside the record definition, however. For example, self-referential schemas like the following are rejected by
LOAD DATA
:{"type": "record","name": "PseudoLinkedList","fields" : [{"name": "value", "type": "long"},{"name": "next", "type": ["null", "PseudoLinkedList"]}]}
If a SCHEMA
clause is provided, the file must be a raw stream
consisting of only the concatenated binary encodings of instances of avro_
.avro_
must be a SQL string containing a JSON Avro schema.raw stream
files.
Warning
It’s an error to provide a SCHEMA
clause when loading an Object Container File because it contains metadata alongside the encoded values.
All optional Avro schema attributes except the namespace
attribute are ignored.logicalType
attributes are ignored.
If file_
ends in .
or .
, it will be decompressed.
Writing to multiple databases in a transaction is not supported.
The SCHEMA REGISTRY {"IP" | "Hostname"}
option allows LOAD DATA
to pull the schema from a schema registry.
Last modified: April 8, 2025