Impala Configuration Changes
The upgrade process to CDP Private Cloud Base changes the default values of some Impala configuration properties and adds new properties.
Impala Configuration Property Values
The following list describes Impala configuration property value changes or additions that occur after upgrading from CDH or HDP to CDP. These changes in properties ensure that CDP Hive and Impala interoperate to the best of their abilities. The CDP default might have changed in a way that impacts your work.
- default_file_format
- Before upgrade:
text
After upgrade:
In CDP the default Impala table file format changed fromparquet
text
toparquet
. If the file format is notparquet
, addstored as
clause in the create table statements explicitly or change the query optiondefault_file_format
totext
to revert to the behavior as CDH.
- default_transactional_type
- Before upgrade:
N/A
After upgrade:
In CDP the default table type for managed tables isinsert_only
insert_only
. If you must revert to the behavior as CDH, setdefault_transactional_type
tonone.
These transactional tables cannot be currently altered in Impala using analter
statement. Similarly, Impala does not support compaction on transaction tables currently. You must use Hive to compact the tables as needed. Other operations likeselect, insert, insert overwrite, truncate
are supported. For latest information, see SQL transactions in Impala.
- hms_event_polling_interval_s
- Before upgrade:
0
After upgrade:
When raw data is inserted ingested into Tables, new HMS metadata and filesystem metadata are generated. In CDH, in order to pick up this new information, you must manually issue an Invalidate or Refresh command. However in CDP,2
hms_event_polling_interval_s
property is set to 2 seconds by default. This option automatically refreshes the tables as changes are detected in HMS. If only specific tables that are not supported by event polling need to be refreshed issue a table level Invalidate or Refresh command.
- disconnected_session_timeout
- Before upgrade:
N/A
After upgrade:
In CDP Impala supports the ability to disconnect a connection to Impala while keeping the session running. Impala clients/drivers may support re-connecting to the same session even when the network connection is interrupted. By default disconnected sessions are not terminated until 15 minutes if you want to reconnect. You can adjust the900
disconnected_session_timeout
flag to a lower value so that disconnected sessions are cleaned up more quickly.
- enable_orc_scanner
- Before upgrade:
True (preview)
After upgrade:
While using Impala to query ORC tables, set the command line argumentTrue
enable_orc_scanner = true
to re-enable ORC table support.
- enable_insert_events
- Before upgrade:
N/A
After upgrade:
If Impala inserts into a table it refreshes the underlying table/partition. When this configurationTrue
enable_insert_events
is set to True Impala will generate INSERT event types which when received by other Impala clusters will automatically refresh the tables or partitions. Event processing must be ON, for this property to work.
- disable_hdfs_num_rows_estimate
- Before upgrade:
N/A
After upgrade:
In CDP Impala, if there are no statistics available on a table, Impala will try to estimate the cardinality by estimating the size of table based on the number of rows in the table. This behavior is turned ON by default to use when stats are not present. However you can set the query optionFalse
disable_hdfs_num_rows_estimate = true
to disable this optimization.
- use_local_catalog
- Before upgrade:
False
After upgrade:
In CDP, the on-demandTrue
use_local_catalog
mode is set toTrue
by default on all the Impala coordinators so that the Impala coordinators pull metadata as needed from catalogd and cache it locally. This reduces memory footprint on coordinators and automate the cache eviction.
- catalog_topic_mode
- Before upgrade:
full
After upgrade:
In CDP, theminimal
catalog_topic_mode
is set tominimal
by default to enable on demand metadata for all coordinators.