Hive Metastore High Availability
You can enable Hive metastore high availability (HA), so that your cluster is resilient to failures due to a metastore that becomes unavailable. Each metastore is independent; they do not use a quorum.
Prerequisites
- Cloudera recommends that each instance of the metastore runs on a separate cluster host, to maximize high availability.
- Hive metastore HA requires a database that is also highly available, such as MySQL with replication. Refer to the documentation for your database of choice to configure it correctly.
Enabling Hive Metastore High Availability Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the Hive service.
- Click the Configuration tab.
- Click the Advanced category.
- If you use a secure cluster, enable the Hive token store by setting the value of the hive.cluster.delegation.token.store.class property to org.apache.hadoop.hive.thrift.DBTokenStore in the Hive Metastore
Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml property.
<property> <name>hive.cluster.delegation.token.store.class</name> <value>org.apache.hadoop.hive.thrift.DBTokenStore</value> </property>
- Click Save Changes to commit the changes.
- Click the Instances tab.
- Click Add Role Instances.
- Click the text field under Hive Metastore Server.
- Select a host on which to run the additional metastore and click OK.
- Click Continue and click Finish.
- Check the checkbox next to the new Hive Metastore Server role.
- Select Start to confirm. , and click
- Click to display the stale configurations page.
- Click Restart Clusterand click Restart Now.
- Click Finish after the cluster finishes restarting.
Enabling Hive Metastore High Availability Using the Command Line
To configure the Hive metastore for high availability, you configure each metastore to store its state in a replicated database, then provide the metastore clients with a list of URIs where metastores are available. The client starts with the first URI in the list. If it does not get a response, it randomly picks another URI in the list and attempts to connect. This continues until the client receives a response.
- Configure Hive on each of the cluster hosts where you want to run a metastore, following the instructions at Configuring the Hive Metastore.
- On the server where the master metastore instance runs, edit the /etc/hive/conf.server/hive-site.xml file, setting the hive.metastore.uris property's value to a list of URIs where a Hive metastore is available for failover.
<property> <name>hive.metastore.uris</name> <value>thrift://metastore1.example.com,thrift://metastore2.example.com,thrift://metastore3.example.com</value> <description> URI for client to contact metastore server </description> </property>
- If you use a secure cluster, enable the Hive token store by configuring the value of the hive.cluster.delegation.token.store.class property to
org.apache.hadoop.hive.thrift.DBTokenStore.
<property> <name>hive.cluster.delegation.token.store.class</name> <value>org.apache.hadoop.hive.thrift.DBTokenStore</value> </property>
- Save your changes and restart each Hive instance.
- Connect to each metastore and update it to use a nameservice instead of a NameNode, as a requirement for high availability.
- From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
hive --service metatool -listFSRoot
- Run the following command with the --dry-run option, to be sure that the nameservice is available and configured correctly. This will not change your
configuration.
hive --service metatool -updateLocation nameservice-uri namenode-uri -dryRun
- Run the same command again without the --dry-run option to direct the metastore to use the nameservice instead of a NameNode.
hive --service metatool -updateLocation nameservice-uri namenode-uri
- From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
- Test your configuration by stopping your main metastore instance, and then attempting to connect to one of the other metastores from a client. The following is an example of doing this
on a RHEL or Fedora system. The example first stops the local metastore, then connects to the metastore on the host metastore2.example.com and runs the SHOW TABLES
command.
$ sudo service hive-metastore stop $ /usr/lib/hive/bin/beeline beeline> !connect jdbc:hive2://metastore2.example.com:10000 username password org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000> SHOW TABLES; show tables; +-----------+ | tab_name | +-----------+ +-----------+ No rows selected (0.238 seconds) 0: jdbc:hive2://localhost:10000>
- Restart the local metastore when you have finished testing.
$ sudo service hive-metastore start