ORC Support Disabled for Full-Transactional Tables
In CDP 7.1.0 and earlier versions, ORC table support
is disabled for Impala queries. However, you have an option to switch to the CDH behavior by
using the command line argument ENABLE_ORC_SCANNER
.
New Default Behavior
In CDP 7.1.0 and earlier versions, if you use Impala to query ORC tables you will see it fail. To mitigate this situation, you must add explicit STORED AS clause to code creating Hive tables and use a format Impala can read. Another option is to set global configuration in Cloudera Manager to change hive_default_fileformat_managed.
Steps to switch to the CDH behavior:
Set the query option ENABLE_ORC_SCANNER
to
TRUE
to re-enable ORC table support.
This option does not work on a full transactional ORC table, and the queries return an error.
ORC vs Parquet in CDP
The differences between Optimized Row Columnar (ORC) file format for storing Hive data and Parquet for storing Impala data are important to understand. Query performance improves when you use the appropriate format for your application. The following table compares Hive and Impala support for ORC and Parquet in CDP Public Cloud and CDP Private Cloud Base. ORC vs Parquet