Load Avro Files with LOAD DATA

Data from AVRO files can be loaded using the LOAD DATA command. For more information on the LOAD DATA command, refer to LOAD DATA. To see a few examples of using LOAD DATA with AVRO files, refer to Load Avro Files Examples.

Avro LOAD DATA

Syntax for LOAD DATA Local Infile

LOAD DATA [LOCAL] INFILE 'file_name'
WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
INTO TABLE tbl_name
FORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}
subvalue_mapping
[SET col_name = expr,...]
[WHERE expr,...]
[MAX_ERRORS number]
[ERRORS HANDLE string]
[SCHEMA 'avro_schema']
subvalue_mapping:
( {col_name | @variable_name} <- subvalue_path, ...)
subvalue_path:
{% | [%::]ident [::ident ...]}

See the associated GitHub repo.

Syntax for LOAD DATA AWS S3 Source

Avro-formatted data stored in an AWS S3 bucket can use a LOAD DATA query without a pipeline. This streamlines the process of loading cloud-stored data into tables.

LOAD DATA S3 '<bucket name>'
CONFIG '{"region" : "<region_name>"}' 
CREDENTIALS '{"aws_access_key_id" : "<key_id> ", 
             "aws_secret_access_key": "<access_key>"}'
INTO TABLE <table_name>
       (`<col_a>` <- %, 
 `<col_b>` <- % DEFAULT NULL , 
  ) FORMAT AVRO;

This data can also be loaded from S3 with a connection link. Refer to CREATE LINK for more information on connection links.

LOAD DATA LINK <link_name> '<bucket name>'
INTO TABLE <table_name>
(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,
) FORMAT AVRO;

Semantics

Error Logging and Error Handling are discussed in the LOAD DATA page.

LOAD DATA for Avro does not support file name globbing (for example: LOAD DATA INFILE '/data/nfs/gp1/*.avro). LOAD DATA for Avro only supports loading a single file per statement.

Extract specified subvalues from each Avro value in file_name. Assign them to specified columns of a new row in tbl_name, or to variables used for a column assignment in a SET clause. Discard rows which don’t match the WHERE clause.

To specify the compression type of an input file, use the COMPRESSION clause. See LOAD DATA for more information.

Avro LOAD DATA expects Avro data in one of two sub-formats, depending on the SCHEMA clause.

If no SCHEMA clause is provided, file_name must name an Avro Object Container File as described in version 1.8.2 of the Avro specification. In addition, the following restrictions hold:

  • The compression codec of the file must be null.

  • Array and map values must not have more than 16384 elements.

  • The type name of a record must not be used in a symbolic reference to previously defined name in any of its fields. It may still be used in a symbolic reference outside the record definition, however.

    For example, self-referential schemas like the following are rejected by LOAD DATA:

    {
    "type": "record",
    "name": "PseudoLinkedList",
    "fields" : [{"name": "value", "type": "long"},
    {"name": "next", "type": ["null", "PseudoLinkedList"]}]
    }

If a SCHEMA clause is provided, the file must be a raw stream consisting of only the concatenated binary encodings of instances of avro_schema. avro_schema must be a SQL string containing a JSON Avro schema. The restrictions on Object Container Files also apply to raw stream files.

Warning

It’s an error to provide a SCHEMA clause when loading an Object Container File because it contains metadata alongside the encoded values.

All optional Avro schema attributes except the namespace attribute are ignored. Notably, logicalType attributes are ignored.

If file_name ends in .gz or .lz4, it will be decompressed.

Writing to multiple databases in a transaction is not supported.

The SCHEMA REGISTRY {"IP" | "Hostname"} option allows LOAD DATA to pull the schema from a schema registry. For more information, see the Avro Schema Evolution With Pipelines topic.

Last modified: April 8, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK