ORC Support Disabled for Full-Transactional Tables

In CDP 7.1.0 and earlier versions, ORC table support is disabled for Impala queries. However, you have an option to switch to the CDH behavior by using the command line argument ENABLE_ORC_SCANNER.

New Default Behavior

In CDP 7.1.0 and earlier versions, if you use Impala to query ORC tables you will see it fail. To mitigate this situation, you must add explicit STORED AS clause to code creating Hive tables and use a format Impala can read. Another option is to set global configuration in Cloudera Manager to change hive_default_fileformat_managed.

Steps to switch to the CDH behavior:

Set the query option ENABLE_ORC_SCANNER to TRUE to re-enable ORC table support.

This option does not work on a full transactional ORC table, and the queries return an error.

ORC vs Parquet in CDP

The differences between Optimized Row Columnar (ORC) file format for storing Hive data and Parquet for storing Impala data are important to understand. Query performance improves when you use the appropriate format for your application. The following table compares Hive and Impala support for ORC and Parquet in CDP Public Cloud and CDP Private Cloud Base. ORC vs Parquet