HDP supports two Hive clients: the Hive CLI and Beeline. The primary difference between the two involves how the clients connect to Hive.
The Hive CLI, which connects directly to HDFS and the Hive Metastore, and can be used only on a host with access to those services.
Beeline, which connects to HiveServer2 and requires access to only one .jar file:
hive-jdbc-<version>-standalone.jar
.
Hortonworks recommends using HiveServer2 and a JDBC client (such as Beeline) as the primary way to access Hive. This approach uses SQL standard-based authorization or Ranger-based authorization. However, some users may wish to access Hive data from other applications, such as Pig. For these use cases, use the Hive CLI and storage-based authorization.
Beeline Operating Modes and HiveServer2 Transport Modes
Beeline supports the following modes of operation:
Table 1.2. Beeline Modes of Operation
Operating Mode | Description |
---|---|
Embedded | The Beeline client and the Hive installation both reside on the same host machine. No TCP connectivity is required. |
Remote | Use remote mode to support multiple, concurrent clients executing queries against the same remote Hive installation. Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL. TCP connectivity is required. |
Administrators may start HiveServer2 in one of the following transport modes:
Table 1.3. HiveServer2 Transport Modes
Transport Mode | Description |
---|---|
TCP | HiveServer2 uses TCP transport for sending and receiving Thrift RPC messages. |
HTTP | HiveServer2 uses HTTP transport for sending and receiving Thrift RPC messages. |
While running in TCP transport mode, HiveServer2 supports the following authentication schemes:
Table 1.4. Authentication Schemes with TCP Transport Mode
Authentication Scheme | Description |
---|---|
Kerberos | A network authentication protocol which operates that uses the concept of 'tickets' to allow nodes in a network to securely identify themselves. Administrators must specify hive.server2.authentication=kerberos in the hive-site.xml configuration file to use this authentication scheme. |
LDAP | The Lightweight Directory Access Protocol, an application-layer protocol that uses the concept of 'directory services' to share information across a network. Administrators must specify hive.server2.authentication=ldap in the hive-site.xml configuration file to use this type of authentication. |
PAM | Pluggable Authentication Modules, or PAM, allow administrators to integrate multiple authentication schemes into a single API. Administrators must specify hive.server2.authentication=pam in the hive-site.xml configuration file to use this authentication scheme. |
Custom | Authentication provided by a custom implementation of the org.apache.hive.service.auth.PasswdAuthenticationProvider interface. The implementing class must be available in the classpath for HiveServer2 and its name provided as the value of the hive.server2.custom.authentication.class property in the hive-site.xml configuration property file. |
None | The Beeline client performs no authentication with HiveServer2. Administrators must specify hive.server2.authentication=none in the hive-site.xml configuration file to use this authentication scheme. |
NoSASL | While running in TCP transport mode, HiveServer2 uses the Java Simple Authentication and Security Layer (SASL) protocol to establish a security layer between the client and server. However, HiveServer2 also supports connections in TCP transfer mode that do not use the SASL protocol Administrators must specify hive.server2.authentication=nosasl in the hive-site.xml configuration file to use this authentication scheme. |
The next section describes the connection strings used to connect to HiveServer2 for all possible combinations of these modes, as well as the connection string required to connect to HiveServer2 in a secure cluster.
Connecting to Hive with Beeline
The following examples demonstrate how to use Beeline to connect to Hive for all possible variations of these modes.
Note | |
---|---|
Documentation for |
Embedded Client
Use the following syntax to connect to Hive from Beeline in embedded mode:
!connect jdbc:hive2://
Remote Client with HiveServer2 TCP Transport Mode and SASL Authentication
Use the following syntax to connect to HiveServer2 in TCP mode from a remote Beeline client:
!connect jdbc:hive2://<host>:<port>/<db>
The default port for HiveServer2 in TCP mode is 10000
, and
db
is the name of the database to which you want to connect.
Remote Client with HiveServer2 TCP Transport Mode and NoSASL Authentication
Clients must explicitly specify the authentication mode in their connection string when HiveServer2 runs in NoSASL mode:
!connect jdbc:hive2://<host>:<port>/<db>;auth=noSasl hiveuser pass org.apache.hive.jdbc.HiveDriver
If users forget to include auth=noSasl
in the JDBC connection string,
the JDBC client API attempts to make an SASL connection to HiveServer2. This causes an open
connection that eventually results in the client crashing with an Out Of Memory error.
Remote Client with HiveServer2 HTTP Transport Mode
Use the following syntax to connect to HiveServer2 in HTTP mode from a remote Beeline client:
!connect jdbc:hive2://<host>:<port>/<db>;hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>
Note | |
---|---|
The value for < |
In an environment that is secured by Kerberos, use the following syntax to connect to HiveServer2 in HTTP mode from a remote Beeline client:
!connect jdbc:hive2://<host>:<port>/<db>;hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>;principal=hive/HOST@REALM
Remote Client with HiveServer2 in Secure Cluster
Use the following syntax to connect to HiveServer2 in a secure cluster from a remote Beeline client:
!connect jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2>
Note | |
---|---|
The Beeline client must have a valid Kerberos ticket in the ticket cache before attempting to connect. |