ORC vs Parquet in CDP
The differences between Optimized Row Columnar (ORC) file format for storing Hive data and Parquet for storing Impala data are important to understand. Query performance improves when you use the appropriate format for your application.
ORC and Parquet capabilities comparison
The following table compares Hive and Impala support for ORC and Parquet in CDP Public
Cloud and CDP Private Cloud Base. The Runtime Services column shows the supported services:
- Hive-on-Tez
- HiveLLAP, supported on CDP Public Cloud only
- Hive metastore (HMS)
- Impala
- Spark
- JDBC
Capability | Data Warehouse | ORC | Parquet | Runtime Services |
---|---|---|---|---|
Read non-transactional data | Apache Hive | ✓ | ✓ | (Hive-on-Tez | HiveLLAP) & HMS |
Read non-transactional data | Apache Impala | ✓ | ✓ | Impala & HMS |
Full ACID transactions | Apache Hive | ✓ | (Hive-on-Tez | HiveLLAP) & HMS | |
Read Insert-only transactions | Apache Impala | ✓ | ✓ | Impala & HMS |
Hive Warehouse Connector reads | Apache Hive | ✓ | ✓ | ((Hive-on-Tez & JDBC) | HiveLLAP) & Spark & HMS |
Hive Warehouse Connector writes | Apache Hive | ✓ | ((Hive-on-Tez & JDBC) | HiveLLAP) & Spark & HMS | |
Column index | Apache Hive | ✓ | ✓ | (Hive-on-Tez | HiveLLAP) & HMS |
Column index | Apache Impala | ✓ | Impala & HMS | |
CBO uses column metadata | Apache Hive | ✓ | (Hive-on-Tez | HiveLLAP) & HMS | |
Recommended format | Apache Hive | ✓ | (Hive-on-Tez | HiveLLAP) & HMS | |
Recommended format | Apache Impala | ✓ | Impala & HMS | |
Vectorized reader | Apache Hive | ✓ | ✓ | (Hive-on-Tez | HiveLLAP) & HMS |
Read complex types | Apache Impala | ✓ | ✓ | Impala & HMS |
Read/write complex types | Apache Hive | ✓ | ✓ | (Hive-on-Tez | HiveLLAP) & HMS |