Configuring HiveServer2
You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.
HiveServer2 Memory Requirements
- For clusters of 100 nodes or larger, 24 GB of heap is required.
- For clusters of 50 nodes to 99 nodes, 12 GB of heap is required.
- For a multi-node cluster with fewer than 50 nodes, 2GB of heap is required.
- For a single-node node cluster, 256 MB of heap is required.
Table Lock Manager (Required)
You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble; see ZooKeeper Installation.
Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):
<property> <name>hive.support.concurrency</name> <description>Enable Hive's Table Lock Manager Service</description> <value>true</value> </property> <property> <name>hive.zookeeper.quorum</name> <description>Zookeeper quorum used by Hive's Table Lock Manager</description> <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value> </property>
(The above settings are also needed if you are still using HiveServer1. HiveServer1 is deprecated; migrate to HiveServer2 as soon as possible.)
hive.zookeeper.client.port
If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check /etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:
<property> <name>hive.zookeeper.client.port</name> <value>2222</value> <description> The port at which the clients will connect. </description> </property>
JDBC driver
The connection URL format and the driver class are different for HiveServer2 and HiveServer1:
HiveServer version |
Connection URL |
Driver Class |
---|---|---|
HiveServer2 |
jdbc:hive2://<host>:<port> |
org.apache.hive.jdbc.HiveDriver |
HiveServer1 |
jdbc:hive://<host>:<port> |
org.apache.hadoop.hive.jdbc.HiveDriver |
Authentication
HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in the hive.server2.authentication property in the hive-site.xml file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon. For more information, see Hive Security Configuration.
Running HiveServer2 and HiveServer Concurrently
HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.
<property> <name>hive.server2.thrift.port</name> <value>10001</value> <description>TCP port number to listen on, default 10000</description> </property>
You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:
HiveServer version |
Port |
Host Address |
---|---|---|
HiveServer2 |
HIVE_SERVER2_THRIFT_PORT |
HIVE_SERVER2_THRIFT_BIND_HOST |
HiveServer1 |
HIVE_PORT |
< Host bindings cannot be specified > |