PARQUET_FILE_SIZE Query Option
Specifies the maximum size of each Parquet data file produced by Impala INSERT statements.
Syntax:
Specify the size in bytes, or with a trailing m or g character to indicate megabytes or gigabytes. For example:
-- 128 megabytes. set PARQUET_FILE_SIZE=134217728 INSERT OVERWRITE parquet_table SELECT * FROM text_table; -- 512 megabytes. set PARQUET_FILE_SIZE=512m; INSERT OVERWRITE parquet_table SELECT * FROM text_table; -- 1 gigabyte. set PARQUET_FILE_SIZE=1g; INSERT OVERWRITE parquet_table SELECT * FROM text_table;
Usage notes:
With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB in Impala 2.0 and later) could be much larger than needed for each data file. For INSERT operations into such tables, you can increase parallelism by specifying a smaller PARQUET_FILE_SIZE value, resulting in more HDFS blocks that can be processed by different nodes.
Type: numeric, with optional unit specifier
Default: 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
Related information:
For information about the Parquet file format, and how the number and size of data files affects query performance, see Using the Parquet File Format with Impala Tables.