Using Impala to query Kudu tables
If you want to use Impala to query Kudu tables, you have to create a mapping between the Impala and Kudu tables.
Neither Kudu nor Impala need special configuration in order for you to use the Impala Shell or the Impala API to insert, update, delete, or query Kudu data using Impala. However, you do need to create a mapping between the Impala and Kudu tables. Kudu provides the Impala query to map to an existing Kudu table in the web UI.
- Make sure you are using the
impala-shell
binary provided by the default CDP Impala binary. The following example shows how you can verify this using thealternatives
command on a RHEL 6 host. Do not copy and paste thealternatives --set
command directly, because the file names are likely to differ.$ sudo alternatives --display impala-shell impala-shell - status is auto. link currently points to /opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell /opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell - priority 10 Current `best' version is opt/cloudera/parcels/<current_CDP_parcel>/bin/impala-shell
- Although not necessary, it is recommended that you configure
Impala with the locations of the Kudu Masters using the
--kudu_master_hosts=<master1>[:port]
flag. If this flag is not set, you will need to manually provide this configuration each time you create a table by specifying thekudu.master_addresses
property inside aTBLPROPERTIES
clause. If you are using Cloudera Manager, no such configuration is needed. The Impala service will automatically recognize the Kudu Master hosts. However, if your Impala queries don't work as expected, use the following steps to make sure that the Impala service is set to be dependent on Kudu:- Go to the Impala service.
- Click the Configuration tab and search for
kudu
. - Make sure that the
Kudu Service
property is set to the right Kudu service. - Click Save Changes.
Before you carry out any of the operations listed within this section, make sure that this configuration has been set.
-
Start Impala Shell using the
impala-shell
command. By default,impala-shell
attempts to connect to the Impala daemon onlocalhost
on port 21000. To connect to a different host, use the-i <host:port>
option.To automatically connect to a specific Impala database, use the
-d <database>
option. For instance, if all your Kudu tables are in Impala in the databaseimpala_kudu
, use-d impala_kudu
to use this database. -
To quit the Impala Shell, use the following command:
quit;