Impala Configuration Changes
The upgrade process to Cloudera Base on premises changes the default values of some Impala configuration properties and adds new properties.
Impala Configuration Property Values
The following list describes Impala configuration property value changes or additions that occur after upgrading from CDH or HDP to CDP. These changes in properties ensure that CDP Hive and Impala interoperate to the best of their abilities. The CDP default might have changed in a way that impacts your work.
- default_file_format
- Before upgrade:
textAfter upgrade:
In CDP the default Impala table file format changed fromparquettexttoparquet. If the file format is notparquet, addstored asclause in the create table statements explicitly or change the query optiondefault_file_formattotextto revert to the behavior as CDH.
- default_transactional_type
- Before upgrade:
N/AAfter upgrade:
In CDP the default table type for managed tables isinsert_onlyinsert_only. If you must revert to the behavior as CDH, setdefault_transactional_typetonone.These transactional tables cannot be currently altered in Impala using analterstatement. Similarly, Impala does not support compaction on transaction tables currently. You must use Hive to compact the tables as needed. Other operations likeselect, insert, insert overwrite, truncateare supported. For latest information, see SQL transactions in Impala.
- hms_event_polling_interval_s
- Before upgrade:
0After upgrade:
When raw data is inserted ingested into Tables, new HMS metadata and filesystem metadata are generated. In CDH, in order to pick up this new information, you must manually issue an Invalidate or Refresh command. However in CDP,2hms_event_polling_interval_sproperty is set to 2 seconds by default. This option automatically refreshes the tables as changes are detected in HMS. If only specific tables that are not supported by event polling need to be refreshed issue a table level Invalidate or Refresh command.
- disconnected_session_timeout
- Before upgrade:
N/AAfter upgrade:
In CDP Impala supports the ability to disconnect a connection to Impala while keeping the session running. Impala clients/drivers may support re-connecting to the same session even when the network connection is interrupted. By default disconnected sessions are not terminated until 15 minutes if you want to reconnect. You can adjust the900disconnected_session_timeoutflag to a lower value so that disconnected sessions are cleaned up more quickly.
- enable_orc_scanner
- Before upgrade:
True (preview)After upgrade:
While using Impala to query ORC tables, set the command line argumentTrueenable_orc_scanner = trueto re-enable ORC table support.
- enable_insert_events
- Before upgrade:
N/AAfter upgrade:
If Impala inserts into a table it refreshes the underlying table/partition. When this configurationTrueenable_insert_eventsis set to True Impala will generate INSERT event types which when received by other Impala clusters will automatically refresh the tables or partitions. Event processing must be ON, for this property to work.
- disable_hdfs_num_rows_estimate
- Before upgrade:
N/AAfter upgrade:
In CDP Impala, if there are no statistics available on a table, Impala will try to estimate the cardinality by estimating the size of table based on the number of rows in the table. This behavior is turned ON by default to use when stats are not present. However you can set the query optionFalsedisable_hdfs_num_rows_estimate = trueto disable this optimization.
- use_local_catalog
- Before upgrade:
FalseAfter upgrade:
In CDP, the on-demandTrueuse_local_catalogmode is set toTrueby default on all the Impala coordinators so that the Impala coordinators pull metadata as needed from catalogd and cache it locally. This reduces memory footprint on coordinators and automate the cache eviction.
- catalog_topic_mode
- Before upgrade:
fullAfter upgrade:
In CDP, theminimalcatalog_topic_modeis set tominimalby default to enable on demand metadata for all coordinators.
