Configuring Impala coordinator high availability

A single Impala coordinator might not handle the number of concurrent queries you want to run or provide the memory your queries require. You can configure multiple active coordinators to resolve or mitigate these problems. You can change the number of active coordinators later.

You can configure up to five active-active Impala coordinators to run in an Impala Virtual Warehouse. When you create an Impala Virtual Warehouse, CDW provides you an option to configure Impala coordinator and Database Catalog high availability, described in the next topic. You can choose one of the following options:

Disables Impala coordinator and Database Catalog high availability
Runs multiple coordinators (one active, one passive) and Database Catalogs (one active, one passive)
Runs multiple coordinators (both active) and Database Catalogs (one active, one passive)

By using two coordinators in an active-passive mode, one coordinator is active at a time. If one coordinator goes down, the passive coordinator becomes active.

If you select the Impala coordinators to be in an active-active mode, the client software uses a cookie to keep a virtual connection to a particular coordinator. When a coordinator disappears for some reason, perhaps due to a coordinator shutting down, then the client software may print the error "Invalid session id" before it automatically reconnects to a new coordinator.

Using active-active coordinators, you can have up to five coordinators running concurrently in active-active mode with a cookie-based load-balancing.

An Impala Web UI is available for each coordinator which you can use for troubleshooting purposes.

Clients who connect to your Impala Virtual Warehouse using multiple coordinators must use the latest Impala shell. The following procedure covers these tasks.

  1. Follow instructions for "Adding a new Virtual Warehouse".
  2. Select the number of executors you need from the Size dropdown menu.
    A number of additional options are displayed, including High availability (HA).
  3. Select the Enabled (Active-Active) option from the High availability (HA) dropdown menu.
  4. Select the number of coordinators you need from the Number of Active Coordinators dropdown menu ranging from 2 to 5.
    You can edit an existing Impala Virtual Warehouse to change the number of active coordinators.
  5. Change values for other settings as needed, click Create, and wait for the Impala Virtual Warehouse to be in the running state.
    Click to learn more about the setting.
  6. Go to Cloudera Data Warehouse > Overview > Impala Virtual Warehouse > > Edit > WEB UI, and then click each Impala Coordinator Web UI n link to get information about the coordinator.
  7. Go to Cloudera Data Warehouse > Overview > Impala Virtual Warehouse > and select the Copy Impala shell Download command option.
    The following command is copied to your clipboard:
    pip install impala-shell==4.1.0
  8. Provide the command to clients who want to connect to the Impala Virtual Warehouse with multiple coordinators using the Impala shell.
  9. Instruct the client user to update impyla to version compatible with CDW, as listed in Data Warehouse Release Notes in section, “Runtime component versions for Cloudera Data Warehouse Private Cloud”.
    For example, installing/updating impyla 0.18a2, is required to connect to your Virtual Warehouse active-active coordinators in CDW 2021.0.3-b27 or later.
  10. Inform the client that to connect over ODBC to an HA-configured Impala Virtual Warehouse that uses active-active coordinators, you must append to the HTTPAuthCookies connector configuration option of the Cloudera ODBC driver.
    Table 1. HTTPAuthCookies
    Key Name Value Required
    HTTPAuthCookies impala.auth,JESSSIONID,KNOXSESSIONID, Yes