Using Impala with Hue

Following are some recommended configurations that give you the best performance when you use Hue with Impala.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

  • Always connect to the Impala load balancer, for example HaProxy, and not to a single coordinator. This helps to avoid the hotspotting issue on a single coordinator. Hotspotting occurs when too many tasks touch the same data, potentially overloading the node or coordinator.
  • Set the querycache_rows configuration property to a lower value than its default setting of 50000 according to your requirements. This configuration property sets the number of initial rows of a result set that Impala caches in order to support re-fetching them.
  • Set the close_queries configuration property to true. This property causes Hue to try to close an Impala query when the user leaves the editor page so it frees all Impala query resources. However, it makes query results for that user inaccessible.
  • Set query_timeout_s and session_timeout_s configuration properties. You can use these properties to set timeouts for query execution and for sessions in Hue, which causes queries and sessions to be cancelled when the timeout period expires. They are set in seconds.

To set these configuration properties in Cloudera Manager:

  1. On the Cloudera Manager home page, select Hue > Configuration.
  2. Search for the Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini, and then append the following configuration information into the text box after any other configurations listed there:

    [impala]
    querycache_rows=<number_of_rows>
    close_queries=true
    query_time_s=<number_of_seconds>
    session_timeout_s=<number_of_seconds>
  3. Click Save Changes and restart the service.