Impala Configuration Changes

The upgrade process to CDP Private Cloud Base changes the default values of some Impala configuration properties and adds new properties.

Impala Configuration Property Values

The following list describes Impala configuration property value changes or additions that occur after upgrading from CDH or HDP to CDP. These changes in properties ensure that CDP Hive and Impala interoperate to the best of their abilities. The CDP default might have changed in a way that impacts your work.

default_file_format
Before upgrade: text

After upgrade: parquet

In CDP the default Impala table file format changed from text to parquet. If the file format is not parquet, add stored as clause in the create table statements explicitly or change the query option default_file_format to text to revert to the behavior as CDH.
default_transactional_type
Before upgrade: N/A

After upgrade: insert_only

In CDP the default table type for managed tables is insert_only. If you must revert to the behavior as CDH, set default_transactional_type to none. These transactional tables cannot be currently altered in Impala using an alter statement. Similarly, Impala does not support compaction on transaction tables currently. You must use Hive to compact the tables as needed. Other operations like select, insert, insert overwrite, truncate are supported. For latest information, see SQL transactions in Impala.
hms_event_polling_interval_s
Before upgrade: 0

After upgrade: 2

When raw data is inserted ingested into Tables, new HMS metadata and filesystem metadata are generated. In CDH, in order to pick up this new information, you must manually issue an Invalidate or Refresh command. However in CDP, hms_event_polling_interval_s property is set to 2 seconds by default. This option automatically refreshes the tables as changes are detected in HMS. If only specific tables that are not supported by event polling need to be refreshed issue a table level Invalidate or Refresh command.
disconnected_session_timeout
Before upgrade: N/A

After upgrade: 900

In CDP Impala supports the ability to disconnect a connection to Impala while keeping the session running. Impala clients/drivers may support re-connecting to the same session even when the network connection is interrupted. By default disconnected sessions are not terminated until 15 minutes if you want to reconnect. You can adjust the disconnected_session_timeout flag to a lower value so that disconnected sessions are cleaned up more quickly.
enable_orc_scanner
Before upgrade: True (preview)

After upgrade: True

While using Impala to query ORC tables, set the command line argument enable_orc_scanner = true to re-enable ORC table support.
enable_insert_events
Before upgrade: N/A

After upgrade: True

If Impala inserts into a table it refreshes the underlying table/partition. When this configuration enable_insert_eventsis set to True Impala will generate INSERT event types which when received by other Impala clusters will automatically refresh the tables or partitions. Event processing must be ON, for this property to work.
disable_hdfs_num_rows_estimate
Before upgrade: N/A

After upgrade: False

In CDP Impala, if there are no statistics available on a table, Impala will try to estimate the cardinality by estimating the size of table based on the number of rows in the table. This behavior is turned ON by default to use when stats are not present. However you can set the query option disable_hdfs_num_rows_estimate = true to disable this optimization.
use_local_catalog
Before upgrade: False

After upgrade: True

In CDP, the on-demand use_local_catalog mode is set to True by default on all the Impala coordinators so that the Impala coordinators pull metadata as needed from catalogd and cache it locally. This reduces memory footprint on coordinators and automate the cache eviction.
catalog_topic_mode
Before upgrade: full

After upgrade: minimal

In CDP, the catalog_topic_mode is set to minimal by default to enable on demand metadata for all coordinators.