Recommended configurations

Following are the recommended configuration setting for the best performance with Impala.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

  • Set the --use_local_tz_for_unix_timestamp_conversions startup flag and the --convert_legacy_hive_parquet_utc_timestamps startup flag both to true. Setting these startup flags to true ensures that the timestamps between Hive and Impala match. See TIMESTAMP Data Type for more details.
  • Always set the --idle_session_timeout and the --idle_query_timeout timeouts for the Impala daemon (impalad). Ensure that the setting for idle_session_timout is less than the setting for the timeout set for your load balancer. See Setting the Idle Query and Idle Session Timeouts for impalad for details.
  • Set the --fe_service_threads startup option for the Impala daemon (impalad) to 256. This option specifies the maximum number of concurrent client connections allowed. See Startup Options for impalad Daemon for details.
  • Increase the --num_metadata_loading_threads startup option to 64 to improve metadata loading performance. See Configuring Impala Startup Options through Cloudera Manager for more information.