SELECT INTO ... FORMAT PARQUET

SELECT INTO ... FORMAT PARQUET allows SingleStore data to be exported into Apache Parquet (Parquet) format for the given SQL type. See Apache Parquet documentation for detailed information.

Some SQL types may not fit into a particular Parquet type. This is dependent on the actual stored values for that type. For data that does not fit, the user will receive suggestions for the appropriate type for storing data in Parquet format.

Syntax

FORMAT PARQUET

Remarks

  • The FORMAT PARQUET clause will go at the end of the SELECT statement.

  • Time and date types are stored in the engine timezone without adjusting to UTC. Below are some examples of the differences between SQL and Parquet time and date formats. If any value of SQL TIME(6) is outside the range of the Parquet TIME, the following error message may occur:

    can not save SQL TIME(6) … because parquet's TIME is within a single day
    (hour should be between 0 and 23); consider casting this column to BIGINT
  • If any value of TIMESTAMP is zero, which is outside the range of the Parquet TIMESTAMP, the following error message may occur:

    can not be saved as TIMESTAMP with millisecond precision because 
    zero value is outside the scope of the parquet type; consider casting 
    this column to INT64

Examples

The examples below have been shortened to illustrate the use of the Parquet format. The data ingestion, creation of table2, and start pipeline are not shown.

Example 1

Creates a table.

CREATE TABLE table1 (f_CHAR_2_5 CHAR(2), f_LONGTEXT4 LONGTEXT NOT NULL,
f_DATETIME_6_3 DATETIME(6), f_NUMERIC_27_13__UNSIGNED2 NUMERIC(27,13) UNSIGNED, f_TINYINT1 TINYINT);

Uses the Parquet format.

SELECT * FROM table1 into fs ‘/tmp/parquet_files’ FORMAT PARQUET;

Creates the pipeline using Parquet syntax.

CREATE PIPELINE pipe1 AS LOAD DATA fs ‘/tmp/parquet_files’
INTO TABLE table2 FORMAT PARQUET (f_CHAR_2_5 <- %::f_CHAR_2_5, f_LONGTEXT4 <- %::f_LONGTEXT4,
@f_DATETIME_6_3 <- %::f_DATETIME_6_3, f_NUMERIC_27_13__UNSIGNED2 <- %::f_NUMERIC_27_13__UNSIGNED2,
f_TINYINT1 <- %::f_TINYINT1 )  SET f_DATETIME_6_3=DATE_ADD('1970-01-01', INTERVAL @f_DATETIME_6_3 MICROSECOND);

Example 2

Creates a table.

CREATE TABLE table1 (f_REAL3 REAL NOT NULL, f_TIME2 TIME, f_ENUM__small___medium___large__1 ENUM('small','medium','large'));

Uses the Parquet format.

SELECT * FROM table1 into fs '/tmp/parquet_files' FORMAT PARQUET;

Creates the pipeline using Parquet syntax.

CREATE PIPELINE pipe2 AS LOAD DATA fs '/tmp/parquet_files' INTO TABLE table2 FORMAT PARQUET
(f_REAL3 <- %::f_REAL3, @f_TIME2 <- %::f_TIME2, f_ENUM__small___medium___large__1 <- %::f_ENUM__small___medium___large__1 )
SET f_TIME2=sec_to_time(@f_TIME2/1000);

Example 3

Creates a table.

CREATE ROWSTORE TABLE table1 (f_TEXT5 TEXT NOT NULL,
f_GEOGRAPHY4 GEOGRAPHY NOT NULL, f_VARCHAR_200_3 VARCHAR(200),
f_DECIMAL_27_13_2 DECIMAL(27,13), f_INT_UNSIGNED1 INT UNSIGNED);

Uses the Parquet format.

SELECT * FROM table1 INTO OUTFILE '/tmp/parquet_files3' FORMAT PARQUET;

Create Link.

CREATE LINK p_link_20_56_14_557666 AS fs CREDENTIALS '{}' CONFIG '{}';

Creates the pipeline using Parquet syntax.

CREATE PIPELINE pipe3 AS LOAD DATA LINK p_link_20_56_14_557666
'/tmp/parquet_files3' INTO TABLE table2 FORMAT PARQUET (f_TEXT5 <- %::f_TEXT5,
f_GEOGRAPHY4 <- %::f_GEOGRAPHY4, f_VARCHAR_200_3 <- %::f_VARCHAR_200_3,
f_DECIMAL_27_13_2 <- %::f_DECIMAL_27_13_2, f_INT_UNSIGNED1 <- %::f_INT_UNSIGNED1);

Last modified: February 7, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK