Impala changes from CDH to Cloudera

Cloudera documentation provides details about changes in Impala when migrating from CDH 5.13-5.16 or CDH 6.1 or later to Cloudera.

  • Impala changes in Cloudera:This document lists down the syntax, service, property and configuration changes that affect Impala after upgrade from CDH 5.13-5.16 or CDH 6.1 or later to Cloudera.
  • Change location of Datafiles: This document explains the impact on Hive warehouse directory location for Impala managed and external tables.
  • Set Storage Engine ACLs : This document describes the steps to set ACLs for Impala to allow Impala to write to the Hive Warehouse Directory.
  • Automatic Invalidation/Refresh of Metadata: In Cloudera, HMS and file system metadata is automatically refreshed. This document provides the details of change in behavior for invalidation and refresh of metadata between CDH and Cloudera.
  • Metadata Improvements: In Cloudera, all catalog metadata improvements are enabled by default. This document lists down the metadata properties in Cloudera that can be customized for better performance and scalability.
  • Default Managed Tables: In Cloudera, managed tables are transactional tables with the insert_only property by default. This document describes modifying file systems on a managed table in Cloudera and the methods to switch to the old CDH behavior.
  • Automatic Refresh of Tables on Impala Clusters: In Cloudera tables or partitions are refreshed automatically on other impala clusters, this document describes the property that enables this behavior.
  • Interoperability between Hive and Impala:This document describes the changes made in Cloudera for the optimal interoperability between Hive and Impala.
  • ORC Support Disabled for Full-Transactional Tables: In Cloudera 7.1.0 and earlier versions, ORC table support is disabled for Impala queries. However, you have an option to switch to the CDH behavior by using the command line argument ENABLE_ORC_SCANNER.
  • Metadata Improvements: In Cloudera, all catalog metadata improvements are enabled by default. This document lists down the metadata properties in Cloudera that can be customized for better performance and scalability.
  • Default Managed Tables: In Cloudera, managed tables are transactional tables with the insert_only property by default. This document describes modifying file systems on a managed table in Cloudera and the methods to switch to the old CDH behavior.
  • Automatic Refresh of Tables on Impala Clusters: In Cloudera tables or partitions are refreshed automatically on other impala clusters, this document describes the property that enables this behavior.
  • Interoperability between Hive and Impala:This document describes the changes made in Cloudera for the optimal interoperability between Hive and Impala.
  • ORC Support Disabled for Full-Transactional Tables: In Cloudera 7.1.0 and earlier versions, ORC table support is disabled for Impala queries. However, you have an option to switch to the CDH behavior by using the command line argument ENABLE_ORC_SCANNER.
  • Authorization Provider for Impala:In Cloudera, Ranger is the authorization provider instead of Sentry. This document describes changes about Ranger enforcing a policy which may be different from using Sentry.
  • Data Governance Support by Atlas:As part of upgrading a CDH cluster to Cloudera, Atlas replaces Cloudera Navigator Data Management for your cluster. This document describes the difference between two environments.