ORC vs Parquet formats
The differences between Optimized Row Columnar (ORC) file format for storing data in SQL engines are important to understand. Query performance improves when you use the appropriate format for your application.
ORC and Parquet capabilities comparison
The following table compares SQL engine support for ORC and Parquet.
Capability | Data Warehouse | ORC | Parquet | SQL Engine |
---|---|---|---|---|
Read non-transactional data | Apache Hive | ✓ | ✓ | Hive |
Read non-transactional data | Apache Impala | ✓ | ✓ | Impala |
Read/Write Full ACID tables | Apache Hive | ✓ | Hive | |
Read Full ACID tables | Apache Impala | ✓ | Impala & HMS | |
Read Insert-only managed tables | Apache Impala | ✓ | ✓ | Impala & HMS |
Column index | Apache Hive | ✓ | ✓ | Hive & HMS |
Column index | Apache Impala | ✓ | Impala & HMS | |
CBO uses column metadata | Apache Hive | ✓ | Hive & HMS | |
Recommended format | Apache Hive | ✓ | Hive & HMS | |
Recommended format | Apache Impala | ✓ | Impala & HMS | |
Vectorized reader | Apache Hive | ✓ | ✓ | Hive & HMS |
Read complex types | Apache Impala | ✓ | ✓ | Impala & HMS |
Read/write complex types | Apache Hive | ✓ | ✓ | Hive & HMS |