Enabling Unified Analytics for Impala in CDW Private Cloud

You can enable Unified Analytics while creating an Impala Virtual Warehouse. Doing this provisions a Virtual Warehouse that can automatically redirect queries to an appropriate SQL engine (either Hive or Impala) depending on the nature of the query.

Why should you enable Unified Analytics

When you create a Hive Virtual Warehouse, Unified Analytics is enabled by default. When you create an Impala Virtual Warehouse, you need to perform an extra step to enable Unified Analytics. Unified Analytics integrates the Hive and Impala SQL engines and Hue and impala-shell query editors. By choosing to enable Unified Analytics while creating an Impala Virtual Warehouse, you provision a warehouse with these capabilities.

To enable Unified Analytics on an Impala Virtual Warehouse, turn on the Enable Unified Analytics option.

Additional settings related to Unified Analytics for Imala

Because Hive-related services and components are added by CDW when you create an Impala Virtual Warehouse with Unified Analytics, ensure that you review and configure the following additional options based on your needs:

Unified Analytics Authentication Mode

After you turn on Enable Unified Analytics option, select the authentication mode from the Unified Analytics Authentication Mode drop-down menu. The default authentication mode for the Hive components in the Unified Analytics mode is LDAP. The authentication mode that you set here applies only to the Unified Analytics' components (mainly Hive). The Impala components continue to support both LDAP and Kerberos authentication modes. If you connect to the Impala Virtual Warehouse remotely, then both LDAP and Kerberos authentication modes can be used. But if you connect to the Impala Virtual Warehouse in the Unified Analytics mode, then only the selected authentication mode is used.

ETL Isolation

The ETL Isolation option is similar to the Query Isolation option that you can use for scan-heavy, data-intensive queries.

Max Concurrent Isolated Queries: Sets the maximum number of isolated queries that can run concurrently in their own dedicated executor nodes or the maximum number of queries that can spawn dedicated executor groups at one time. Select this number based on the scan size of the data for your average scan-heavy, data-intensive query. For example, if Max Concurrent Isolated Queries is set to 3 and a dedicated executor group is spawned for each data-intensive query, only 3 dedicated executor groups can be running at one time. If another data-intensive query is received, it must wait in a queue to run.

Max Nodes Per Isolated Query: Sets how many executor nodes that can be created for each isolated data-intensive query.