Recommended configurations
Following are the recommended configuration setting for the best performance with Impala.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
- Set the
--use_local_tz_for_unix_timestamp_conversions
startup flag and the--convert_legacy_hive_parquet_utc_timestamps
startup flag both to true. Setting these startup flags to true ensures that the timestamps between Hive and Impala match. See TIMESTAMP Data Type for more details. - Always set the
--idle_session_timeout
and the--idle_query_timeout
timeouts for the Impala daemon (impalad). Ensure that the setting foridle_session_timout
is less than the setting for the timeout set for your load balancer. See Setting the Idle Query and Idle Session Timeouts for impalad for details. - Set the
--fe_service_threads
startup option for the Impala daemon (impalad) to 256. This option specifies the maximum number of concurrent client connections allowed. See Startup Options for impalad Daemon for details. - Increase the
--num_metadata_loading_threads
startup option to 64 to improve metadata loading performance. See Configuring Impala Startup Options through Cloudera Manager for more information.