Using SequenceFile Data Files
Impala supports creating and querying SequenceFile tables.
Creating SequenceFile Tables and Loading Data
To create a table in the SequenceFile format, use the
AS SEQUENCEFILE clause in the
Because Impala can query SequenceFile tables but cannot currently write to, after creating SequenceFile tables, use the Hive shell to load the data.
After loading data into a table through Hive or other mechanism outside
of Impala, issue a
table_name statement the next time you
connect to the Impala node, before querying the table, to make Impala
recognize the new data.
Enabling Compression for SequenceFile Tables
You may want to enable compression on existing tables. Enabling compression provides performance gains in most cases and is supported for SequenceFile tables.
The compression type is specified in the
For example, to enable Snappy compression, you would specify the following additional settings when loading data through the Hive shell.
Query Performance for SequenceFile Tables
In general, expect query performance with SequenceFile tables to be faster than with tables using text data, but slower than with Parquet tables.