Load Avro Files Examples

On this page

Note

For more information on the LOAD DATA command, refer to LOAD DATA.

Examples

Example 1

Consider an Avro Object Container Fileexample.avro with the following schema:

{
"type": "record",
"name": "data",
"fields": [{ "name": "id", "type": "long"},
{ "name": "payload", "type": [ "null",
"string" ]}]
}

example.avro contains three Avro values whose JSON encodings are:

{"id":1,"payload":{"string":"first"}}
{"id":1,"payload":{"string":"second"}}
{"id":1,"payload":null}

example.avro can be loaded as follows:

CREATE TABLE t(payload TEXT, input_record JSON);
LOAD DATA LOCAL INFILE "example.avro"
INTO TABLE t
FORMAT AVRO
( payload <- %::payload::string,
input_record <- % );
SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

LOAD DATA was able to parse example.avro because Avro Object Container Files have a header which contains their schema.

Example 2

Consider a file named example.raw_avro, with the same values as example.avro from Example 1 but in the raw stream format. That is, example.raw_avro consists of the binary encoded values and nothing else. We add a SCHEMA clause to tell LOAD DATA to expect a raw stream with the provided schema:

CREATE TABLE t(payload TEXT, input_record JSON);
LOAD DATA LOCAL INFILE "example.raw_avro"
INTO TABLE t
FORMAT AVRO
( payload <- %::payload::string,
input_record <- % )
schema
'{
"type": "record",
"name": "data",
"fields": [{ "name": "id", "type": "long"},
{ "name": "payload", "type": [ "null", "string" ]}]
}';
SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

Example 3

Consider an Object Container Fileexample3.avro with a more complicated payload than Example 1. We illustrate extracting values from nested unions and records, and also indirectly extracting elements of nested maps and arrays.

{ "type": "record",
"namespace": "ns",
"name": "data",
"fields": [
{ "name": "id", "type": "long" },
{ "name": "payload", "type":
[ "null",
{ "type": "record",
"name": "payload_record",
"namespace": "ns",
"fields": [
{ "name": "f_bytes", "type": "bytes"},
{ "name": "f_string", "type": "string"},
{ "name": "f_map", "type":
{ "type": "map",
"values": { "type": "array",
"items": "int" }}}
]
}
]
}
]
}

The raw JSON encoding of the contents of this file can be seen in column c_whole_raw after the following LOAD DATA:

CREATE TABLE t (
c_id bigint,
c_bytes longblob,
c_string longblob,
c_array_second int,
c_whole_raw longblob,
c_whole_json json
);
LOAD DATA INFILE "example3.avro"
INTO TABLE t
FORMAT AVRO
( c_id <- %::id,
c_bytes <- %::payload::`ns.payload_record`::f_bytes,
c_string <- %::payload::`ns.payload_record`::f_string,
@v_map <- %::payload::`ns.payload_record`::f_map,
c_whole_raw <- %,
c_whole_json <- %)
SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);
SELECT * FROM t;
*** 1. row ***
          c_id: 1
       c_bytes: NULL
      c_string: NULL
c_array_second: NULL
   c_whole_raw: {"id":1,"payload":null}
  c_whole_json: {"id":1,"payload":null}
*** 2. row ***
          c_id: 2
       c_bytes: "A
      c_string: "A
c_array_second: 2
   c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\u0022\u0041","f_string":"\"A","f_map":{"a":[1,2]}}}}
  c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\"A","f_map":{"a":[1,2]},"f_string":"\"A"}}}

There are several things to note:

  • We attempted to extract subvalues of the payload_record branch of the union-type payload field. Since that wasn’t the selected member of the union in record 1, LOAD DATA assigned NULL to c_bytes and @v_map.

  • We assigned the JSON encoding of f_map to @v_map and then performed JSON map and array lookups in the SET clause to ultimately extract 2.

  • f_string and f_bytes had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON type

    • The JSON encoding of the Avro string value f_string, as seen in c_whole_raw, encodes special characters like " as the escape sequence \".

    • The JSON encoding of the Avro bytes value f_bytes, as seen in c_whole_raw, encodes every byte with a JSON escape.

    • When converting the JSON encoding of record 2 to the SQL JSON type while assigning to c_whole_json, LOAD DATA normalized both representations of the byte sequence "A to \"A.

Last modified: April 8, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK