SELECT INTO . . . FORMAT PARQUET
On this page
SELECT INTO .
allows SingleStore data to be exported into Apache Parquet (Parquet) format for the given SQL type.
Some SQL types may not fit into a particular Parquet type.
Syntax
FORMAT PARQUET
Remarks
-
The
FORMAT PARQUET
clause will go at the end of theSELECT
statement. -
Time and date types are stored in the engine timezone without adjusting to UTC.
Below are some examples of the differences between SQL and Parquet time and date formats. If any value of SQL TIME(6) is outside the range of the Parquet TIME, the following error message may occur: can not save SQL TIME(6) … because parquet's TIME is within a single day (hour should be between 0 and 23); consider casting this column to BIGINT
-
If any value of TIMESTAMP is zero, which is outside the range of the Parquet TIMESTAMP, the following error message may occur:
can not be saved as TIMESTAMP with millisecond precision because zero value is outside the scope of the parquet type; consider casting this column to INT64
Examples
The examples below have been shortened to illustrate the use of the Parquet format.
Example 1
Creates a table.
CREATE TABLE table1 (f_CHAR_2_5 CHAR(2), f_LONGTEXT4 LONGTEXT NOT NULL,f_DATETIME_6_3 DATETIME(6), f_NUMERIC_27_13__UNSIGNED2 NUMERIC(27,13) UNSIGNED, f_TINYINT1 TINYINT);
Uses the Parquet format.
SELECT * FROM table1 into fs ‘/tmp/parquet_files’ FORMAT PARQUET;
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe1 AS LOAD DATA fs ‘/tmp/parquet_files’INTO TABLE table2 FORMAT PARQUET (f_CHAR_2_5 <- %::f_CHAR_2_5, f_LONGTEXT4 <- %::f_LONGTEXT4,@f_DATETIME_6_3 <- %::f_DATETIME_6_3, f_NUMERIC_27_13__UNSIGNED2 <- %::f_NUMERIC_27_13__UNSIGNED2,f_TINYINT1 <- %::f_TINYINT1 ) SET f_DATETIME_6_3=DATE_ADD('1970-01-01', INTERVAL @f_DATETIME_6_3 MICROSECOND);
Example 2
Creates a table.
CREATE TABLE table1 (f_REAL3 REAL NOT NULL, f_TIME2 TIME, f_ENUM__small___medium___large__1 ENUM('small','medium','large'));
Uses the Parquet format.
SELECT * FROM table1 into fs '/tmp/parquet_files' FORMAT PARQUET;
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe2 AS LOAD DATA fs '/tmp/parquet_files' INTO TABLE table2 FORMAT PARQUET(f_REAL3 <- %::f_REAL3, @f_TIME2 <- %::f_TIME2, f_ENUM__small___medium___large__1 <- %::f_ENUM__small___medium___large__1 )SET f_TIME2=sec_to_time(@f_TIME2/1000);
Example 3
Creates a table.
CREATE ROWSTORE TABLE table1 (f_TEXT5 TEXT NOT NULL,f_GEOGRAPHY4 GEOGRAPHY NOT NULL, f_VARCHAR_200_3 VARCHAR(200),f_DECIMAL_27_13_2 DECIMAL(27,13), f_INT_UNSIGNED1 INT UNSIGNED);
Uses the Parquet format.
SELECT * FROM table1 INTO OUTFILE '/tmp/parquet_files3' FORMAT PARQUET;
Create Link.
CREATE LINK p_link_20_56_14_557666 AS fs CREDENTIALS '{}' CONFIG '{}';
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe3 AS LOAD DATA LINK p_link_20_56_14_557666'/tmp/parquet_files3' INTO TABLE table2 FORMAT PARQUET (f_TEXT5 <- %::f_TEXT5,f_GEOGRAPHY4 <- %::f_GEOGRAPHY4, f_VARCHAR_200_3 <- %::f_VARCHAR_200_3,f_DECIMAL_27_13_2 <- %::f_DECIMAL_27_13_2, f_INT_UNSIGNED1 <- %::f_INT_UNSIGNED1);
Last modified: February 7, 2024