Load Avro Files Examples
On this page
Note
For more information on the LOAD DATA command, refer to LOAD DATA.
Examples
Example 1
Consider an Avro Object Container Fileexample.
with the following schema:
{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null","string" ]}]}
example.
contains three Avro values whose JSON encodings are:
{"id":1,"payload":{"string":"first"}}{"id":1,"payload":{"string":"second"}}{"id":1,"payload":null}
example.
can be loaded as follows:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % );SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+
LOAD DATA
was able to parse example.
because Avro Object Container Files have a header which contains their schema.
Example 2
Consider a file named example.
, with the same values as example.
from Example 1 but in the raw stream format
.example.
consists of the binary encoded values and nothing else.SCHEMA
clause to tell LOAD DATA
to expect a raw stream
with the provided schema:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.raw_avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % )schema'{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null", "string" ]}]}';SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+
Example 3
Consider an Object Container Fileexample3.
with a more complicated payload than Example 1.
{ "type": "record","namespace": "ns","name": "data","fields": [{ "name": "id", "type": "long" },{ "name": "payload", "type":[ "null",{ "type": "record","name": "payload_record","namespace": "ns","fields": [{ "name": "f_bytes", "type": "bytes"},{ "name": "f_string", "type": "string"},{ "name": "f_map", "type":{ "type": "map","values": { "type": "array","items": "int" }}}]}]}]}
The raw JSON encoding of the contents of this file can be seen in column c_
after the following LOAD DATA
:
CREATE TABLE t (c_id bigint,c_bytes longblob,c_string longblob,c_array_second int,c_whole_raw longblob,c_whole_json json);LOAD DATA INFILE "example3.avro"INTO TABLE tFORMAT AVRO( c_id <- %::id,c_bytes <- %::payload::`ns.payload_record`::f_bytes,c_string <- %::payload::`ns.payload_record`::f_string,@v_map <- %::payload::`ns.payload_record`::f_map,c_whole_raw <- %,c_whole_json <- %)SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);SELECT * FROM t;
*** 1. row ***
c_id: 1
c_bytes: NULL
c_string: NULL
c_array_second: NULL
c_whole_raw: {"id":1,"payload":null}
c_whole_json: {"id":1,"payload":null}
*** 2. row ***
c_id: 2
c_bytes: "A
c_string: "A
c_array_second: 2
c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\u0022\u0041","f_string":"\"A","f_map":{"a":[1,2]}}}}
c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\"A","f_map":{"a":[1,2]},"f_string":"\"A"}}}
There are several things to note:
-
We attempted to extract subvalues of the
payload_
branch of the union-typerecord payload
field.Since that wasn’t the selected member of the union in record 1, LOAD DATA
assignedNULL
toc_
andbytes @v_
.map -
We assigned the JSON encoding of
f_
tomap @v_
and then performed JSON map and array lookups in themap SET
clause to ultimately extract2
. -
f_
andstring f_
had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON typebytes -
The JSON encoding of the Avro
string
valuef_
, as seen instring c_
, encodes special characters likewhole_ raw "
as the escape sequence\"
. -
The JSON encoding of the Avro
bytes
valuef_
, as seen inbytes c_
, encodes every byte with a JSON escape.whole_ raw -
When converting the JSON encoding of record 2 to the SQL JSON type while assigning to
c_
,whole_ json LOAD DATA
normalized both representations of the byte sequence"A
to\"A
.
-
Last modified: April 8, 2025