Using Impala to query Kudu tables

If you want to use Impala to query Kudu tables, you have to create a mapping between the Impala and Kudu tables.

Neither Kudu nor Impala need special configuration in order for you to use the Impala Shell or the Impala API to insert, update, delete, or query Kudu data using Impala. However, you do need to create a mapping between the Impala and Kudu tables. Kudu provides the Impala query to map to an existing Kudu table in the web UI.

  • Make sure you are using the impala-shell binary provided by the default CDP Impala binary. The following example shows how you can verify this using the alternatives command on a RHEL 6 host. Do not copy and paste the alternatives --set command directly, because the file names are likely to differ.
    $ sudo alternatives --display impala-shell
    impala-shell - status is auto.
     link currently points to /opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell
    /opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell - priority 10
    Current `best' version is opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell
  • Although not necessary, it is recommended that you configure Impala with the locations of the Kudu Masters using the --kudu_master_hosts=<master1>[:port] flag. If this flag is not set, you will need to manually provide this configuration each time you create a table by specifying the kudu.master_addresses property inside a TBLPROPERTIES clause. If you are using Cloudera Manager, no such configuration is needed. The Impala service will automatically recognize the Kudu Master hosts. However, if your Impala queries don't work as expected, use the following steps to make sure that the Impala service is set to be dependent on Kudu:
    1. Go to the Impala service.
    2. Click the Configuration tab and search for kudu.
    3. Make sure that the Kudu Service property is set to the right Kudu service.
    4. Click Save Changes.

    Before you carry out any of the operations listed within this section, make sure that this configuration has been set.

  • Start Impala Shell using the impala-shell command. By default, impala-shell attempts to connect to the Impala daemon on localhost on port 21000. To connect to a different host, use the -i <host:port> option.

    To automatically connect to a specific Impala database, use the -d <database> option. For instance, if all your Kudu tables are in Impala in the database impala_kudu, use -d impala_kudu to use this database.

  • To quit the Impala Shell, use the following command: quit;