Using Impala with Hue
Following are some recommended configurations that give you the best performance when you use Hue with Impala.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
- Always connect to the Impala load balancer, for example HaProxy, and not to a single coordinator. This helps to avoid the hotspotting issue on a single coordinator. Hotspotting occurs when too many tasks touch the same data, potentially overloading the node or coordinator.
- Set the
querycache_rows
configuration property to a lower value than its default setting of50000
according to your requirements. This configuration property sets the number of initial rows of a result set that Impala caches in order to support re-fetching them. - Set the
close_queries
configuration property to true. This property causes Hue to try to close an Impala query when the user leaves the editor page so it frees all Impala query resources. However, it makes query results for that user inaccessible. - Set
query_timeout_s
andsession_timeout_s
configuration properties. You can use these properties to set timeouts for query execution and for sessions in Hue, which causes queries and sessions to be cancelled when the timeout period expires. They are set in seconds.
To set these configuration properties in Cloudera Manager:
- On the Cloudera Manager home page, select .
-
Search for the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini, and then append the following configuration information into the text box after any other configurations listed there:
[impala] querycache_rows=<number_of_rows> close_queries=true query_timeout_s=<number_of_seconds> session_timeout_s=<number_of_seconds>
- Click Save Changes and restart the service.