The Impala Service
You can install Cloudera Impala through the Cloudera Manager installation wizard, using either parcels or packages, and have the service created and started as part of the first run installation wizard. See Installing Impala.
If you elect not to include the Impala service using the installation wizard, you can you the Add Service wizard to perform the installation. The wizard will automatically configure and start the dependent services and the Impala service. See Adding a Service for instructions.
Configuring the Impala Service
There are several types of configuration settings you may need to apply, depending on your situation.
Running Impala with CDH 4.1
If you are running CDH 4.1, and the Bypass Hive Metastore Server option is enabled, do the following:- Go to the Impala service.
- Click the Configuration tab.
- Select .
- Add the following to the Impala Advanced
Configuration Snippet for hive-site.xml property, replacing
hive_metastore_server_host with the name of your Hive
Metastore Server
host:
<property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://hive_metastore_server_host:9083</value> </property>
- Click Save Changes.
- Restart the Impala service.
Enabling the Sentry Service for Impala
- Enable the Sentry service for Hive. For details on how to do this, see Enabling the Sentry Service for Hive.
- Go to the Impala service.
- Click the Configuration tab.
- In the Service-Wide category, set the Sentry Service property to Sentry.
- Restart Impala.
Enabling Sentry Authorization using Policy Files for Impala
- Enable Sentry's policy file based authorization for Hive. For details on how to do this, see Enabling Sentry Authorization using Policy Files.
- Go to the Impala service.
- Click the Configuration tab.
- Under the Service-Wide category, go to the Policy File Based Sentry section.
- Check Enable Sentry Authorization Using Policy Files, then click Save Changes.
- Restart the Impala service.
Configuring Table Statistics
Configuring table statistics is highly recommended when using Impala. It allows Impala to make optimizations that can result in significant (over 10x) performance improvement for some joins. If these are not available, Impala will still function, but at lower performance.
The Impala implementation to compute table statistics is available in CDH 5.0.0 or higher and in Impala version 1.2.2 or higher. The Impala implementation of COMPUTE STATS requires no setup steps and is preferred over the Hive implementation. See Table Statistics. If you are running an older version of Impala, follow the procedure in Hive Table Statistics.
Impala Llama ApplicationMaster
The Impala Llama ApplicationMaster (Llama) role reserves and releases YARN-managed resources for Impala, thus reducing resource management overhead when performing Impala queries. For further information, see Managing Resources.
Adding the Llama Role
- Manually enable cgroup-based resource management:
- In the top navigation bar, click Hosts.
- Click the Configuration tab.
- Expand Resource Management.
- Check the Enable Cgroup-based Resource Management checkbox.
- Click Save Changes.
- Optionally configure one or more dynamic resource pools for YARN. If you do not configure pools, queries use the default pool or a pool named for the users who submit the queries.
- Configure YARN resource management properties:
- Go to the YARN service.
- Click the Configuration tab.
- Select .
- Check the Use CGroups for Resource Management and Always use Linux Container Executor properties.
- Click Save Changes.
- Select .
- Set the Container Memory Minimum and Container Virtual CPU Cores Minimum properties to 0.
- Click Save Changes.
- Select .
- Record the value of the Container Memory property.
- Configure Impala resource management properties:
- Go to the Impala service.
- Click the Configuration tab.
- Select .
- Set it to the YARN service.
- Select .
- Set Impala Daemon Memory Limit property to be equal to the value you recorded in 3j.
- Click Save Changes.
- Add and configure the Llama
role:
- Click the Instances tab.
- Click the Add Role Instances button.
- Select a host in the column under Impala Llama ApplicationMaster, then click OK.
- Click Continue.
- Click the Configuration tab.
- Click Impala Llama ApplicationMaster Default Group.
- In the Core Queues property, enter the pools you created in step 2, if any.
- Click Save Changes.
- Restart services and redeploy client
configurations:
- Click in the top right.
- Click Restart Cluster.
- Click Restart Now.
- Click Finish.
Configuring Llama for High Availability
Llama High Availability (HA) uses an Active/Standby architecture, in which the active Llama is automatically elected using the ZooKeeper-based ActiveStandbyElector. The active Llama accepts RPC/Thrift connections and communicates with YARN. The standby Llama monitors the leader information in ZooKeeper, but doesn't accept RPC/Thrift connections.
Only one of the Llamas should be active to ensure the resources are not partitioned. Llama uses ZooKeeper Access Control Lists (ACLs) to claim exclusive ownership of the cluster when transitioning to active, and monitors this ownership periodically. If another Llama takes over, the first one realizes it within this period.
To claim resources from YARN, Llama spawns YARN applications and runs unmanaged ApplicationMasters. When a Llama goes down, the resources allocated to all the YARN applications spawned by it are not reclaimed until YARN times out those applications (default timeout is 10 minutes). On Llama failure, these resources are reclaimed by means of a Llama that kills any YARN applications spawned by this pair of Llamas.
- Go to the Impala service.
- Add a Llama role instance.
- Click the Configuration tab.
- Expand the category.
- In the Impala Llama
ApplicationMaster Advanced Configuration Snippet (Safety Valve) for
llama-site.xml property, configure the following
properties:
Property Description Default Recommended llama.am.cluster.id Cluster ID of the Llama pair, used to differentiate between different Llamas llama [cluster-specific] llama.am.ha.enabled* Whether to enable Llama HA false true llama.am.ha.zk-quorum* ZooKeeper quorum to use for leader election and fencing [cluster-specific] llama.am.ha.zk-base Base znode for leader election and fencing data /llama [cluster-specific] llama.am.ha.zk-timeout-ms The session timeout, in milliseconds, for connections to ZooKeeper quorum 10000 10000 llama.am.ha.zk-acl ACLs to control access to ZooKeeper world:anyone:rwcda [cluster-specific] llama.am.ha.zk-auth Authorization information to go with the ACLs [cluster-acl-specific] *Required configurations
You must enter property values in XML format. For example:<property> <name>llama.am.cluster.id</name> <value>llama</value> </property>
- Expand the category.
- Specify command-line flags as one key-value pair per
line in the Impala Daemon Command Line
Argument Advanced Configuration Snippet (Safety Valve) property.
The supported flags are:
- -llama_addresses: Comma-separated list of hostname:port items, specifying all the members of the Llama availability group. Defaults to "127.0.0.1:15000".
- -llama_max_request_attempts: Maximum number of times a request to reserve, expand, or release resources is retried until the request is cancelled. Attempts are only counted after Impala is registered with Llama. That is, a request survives at mostllama_max_request_attempts-1 re-registrations. Defaults to 5.
- -llama_registration_timeout_secs: Maximum number of seconds that Impala will attempt to register or re-register with Llama. If registration is unsuccessful, Impala cancels the action with an error, which could result in an impalad startup failure or a cancelled query. A setting of -1 means try indefinitely. Defaults to 30.
- -llama_registration_wait_secs: Number of seconds to wait between attempts during Llama registration. Defaults to 3.
-llama_addresses=host1:15000,host2:15000 -llama_max_request_attempts=10
- Click Save Changes.
- Restart services and redeploy client
configurations:
- Click in the top right.
- Click Restart Cluster.
- Click Restart Now.
- Click Finish.
Impala Web Servers
Enabling and Disabling Access to Impala Web Servers
By default access to the Impala Daemon and StateStore web servers is enabled.-
Impala StateStore
- Go to the Impala service.
- Click the Configuration tab.
- Select Impala StateStore Default Group.
- Check or uncheck Enable StateStore Web Server.
- Click Save Changes.
- Restart the Impala service.
-
Impala Daemon
- Go to the Impala service.
- Click the Configuration tab.
- Select .
- Check or uncheck Enable Impala Daemon Web Server.
- Click Save Changes.
- Restart the Impala service.
Opening Impala Web Server UIs
-
Impala StateStore
- Go to the Impala service.
- Select .
- Impala Daemon
- Go the to Impala service.
- Click the Instances tab.
- Click an Impala Daemon instance.
- Click Impala Daemon Web UI.
- Impala Catalog Server
- Go to the Impala service.
- Select .
- Impala Llama ApplicationMaster
- Go to the Impala service.
- Click the Instances tab.
- Click a Impala Llama ApplicationMaster instance.
- Click Llama Web UI.
Configuring Secure Access for Impala Web Servers
Cloudera Manager supports two methods of authentication for secure access to the Impala Catalog Server, Daemon, and StateStore web servers: password-based authentication and SSL certificate authentication. Both of these can be configured through properties of the Impala Catalog Server, Daemon, and StateStore. Authentication for the three types of daemons can be configured independently.
Configuring Password Authentication
- Go to the Impala service.
- Click the Configuration tab.
- Search for "password" using the Search box within the Configuration page. This should display the password-related properties (Username and Password properties) for the Impala Catalog Server, Daemon, and StateStore. If there are multiple role groups configured for Impala Daemon instances, the search should display all of them.
- Enter a username and password into these fields.
- Click Save Changes.
- Restart the Impala service.
Now when you access the Web UI for the Impala Catalog Server, Daemon, and StateStore, you are asked to log in before access is granted.
Configuring SSL Certificate Authentication
- Create or obtain an SSL certificate.
- Place the certificate, in .pem format, on the hosts where the Impala Catalog Server and StateStore are running, and on each host where an Impala Daemon is running. It can be placed in any location (path) you choose. If all the Impala Daemons are members of the same role group, then the .pem file must have the same path on every host.
- Go to the Impala service page.
- Click the Configuration tab.
- Search for "certificate" using the Search box within the Configuration page. This should display the certificate file location properties for the Impala Catalog Server, Daemon, and StateStore. If there are multiple role groups configured for Impala Daemon instances, the search should display all of them.
- In the property fields, enter the full path name to the certificate file.
- Click Save Changes.
- Restart the Impala service.
When you access the Web UI for the Impala Catalog Server, Daemon, and StateStore, https will be used.
<< The Hue Service | The Key-Value Store Indexer Service >> | |