ORC vs Parquet in CDP

The differences between Optimized Row Columnar (ORC) file format for storing Hive data and Parquet for storing Impala data are important to understand. Query performance improves when you use the appropriate format for your application.

ORC and Parquet capabilities comparison

The following table compares Hive and Impala support for ORC and Parquet in CDP Public Cloud and CDP Private Cloud Base. The Runtime Services column shows the supported services:

Hive-on-Tez
HiveLLAP, supported on CDP Public Cloud only
Hive metastore (HMS)
Impala
Spark
JDBC

Table 1.
Capability	Data Warehouse	ORC	Parquet	Runtime Services
Read non-transactional data	Apache Hive	✓	✓	(Hive-on-Tez \| HiveLLAP) & HMS
Read non-transactional data	Apache Impala	✓	✓	Impala & HMS
Full ACID transactions	Apache Hive	✓		(Hive-on-Tez \| HiveLLAP) & HMS
Read Insert-only transactions	Apache Impala	✓	✓	Impala & HMS
Hive Warehouse Connector reads	Apache Hive	✓	✓	((Hive-on-Tez & JDBC) \| HiveLLAP) & Spark & HMS
Hive Warehouse Connector writes	Apache Hive	✓		((Hive-on-Tez & JDBC) \| HiveLLAP) & Spark & HMS
Column index	Apache Hive	✓	✓	(Hive-on-Tez \| HiveLLAP) & HMS
Column index	Apache Impala		✓	Impala & HMS
CBO uses column metadata	Apache Hive	✓		(Hive-on-Tez \| HiveLLAP) & HMS
Recommended format	Apache Hive	✓		(Hive-on-Tez \| HiveLLAP) & HMS
Recommended format	Apache Impala		✓	Impala & HMS
Vectorized reader	Apache Hive	✓	✓	(Hive-on-Tez \| HiveLLAP) & HMS
Read complex types	Apache Impala	✓	✓	Impala & HMS
Read/write complex types	Apache Hive	✓	✓	(Hive-on-Tez \| HiveLLAP) & HMS