ORC vs Parquet formats

The differences between Optimized Row Columnar (ORC) file format for storing data in SQL engines are important to understand. Query performance improves when you use the appropriate format for your application.

ORC and Parquet capabilities comparison

The following table compares SQL engine support for ORC and Parquet.

Table 1.
Capability Data Warehouse ORC Parquet SQL Engine
Read non-transactional data Apache Hive Hive
Read non-transactional data Apache Impala Impala
Read/Write Full ACID tables Apache Hive Hive
Read Full ACID tables Apache Impala Impala
Read Insert-only managed tables Apache Impala Impala
Column index Apache Hive Hive
Column index Apache Impala Impala
CBO uses column metadata Apache Hive Hive
Recommended format Apache Hive Hive
Recommended format Apache Impala Impala
Vectorized reader Apache Hive Hive
Read complex types Apache Impala Impala
Read/write complex types Apache Hive Hive