Data Access
Also available as:
PDF
loading table of contents...

Beeline versus Hive CLI

HDP supports two Hive clients: the Hive CLI and Beeline. The primary difference between the two involves how the clients connect to Hive.

  • The Hive CLI, which connects directly to HDFS and the Hive Metastore, and can be used only on a host with access to those services.

  • Beeline, which connects to HiveServer2 and requires access to only one .jar file: hive-jdbc-<version>-standalone.jar.

Hortonworks recommends using HiveServer2 and a JDBC client (such as Beeline) as the primary way to access Hive. This approach uses SQL standard-based authorization or Ranger-based authorization. However, some users may wish to access Hive data from other applications, such as Pig. For these use cases, use the Hive CLI and storage-based authorization.

Beeline Operating Modes and HiveServer2 Transport Modes

Beeline supports the following modes of operation:

Table 2.8. Beeline Modes of Operation

Operating ModeDescription

Embedded

The Beeline client and the Hive installation both reside on the same host machine. No TCP connectivity is required.

Remote

Use remote mode to support multiple, concurrent clients executing queries against the same remote Hive installation. Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL. TCP connectivity is required.


Administrators may start HiveServer2 in one of the following transport modes:

Table 2.9. HiveServer2 Transport Modes

Transport ModeDescription

TCP

HiveServer2 uses TCP transport for sending and receiving Thrift RPC messages.

HTTP

HiveServer2 uses HTTP transport for sending and receiving Thrift RPC messages.


While running in TCP transport mode, HiveServer2 supports the following authentication schemes:

Table 2.10. Authentication Schemes with TCP Transport Mode

Authentication SchemeDescription

Kerberos

A network authentication protocol which operates that uses the concept of 'tickets' to allow nodes in a network to securely identify themselves. Administrators must specify hive.server2.authentication=kerberos, hive.server2.authentication.kerberos.principal = hive/_HOST@YOUR-REALM.COM, and hive.server2.authentication.kerberos.keytab = /etc/hive/conf/hive.keytab in the hive-site.xml configuration file to use this authentication scheme.

LDAP

The Lightweight Directory Access Protocol, an application-layer protocol that uses the concept of 'directory services' to share information across a network. Administrators must specify hive.server2.authentication=ldap in the hive-site.xml configuration file to use this type of authentication.

PAM

Pluggable Authentication Modules, or PAM, allow administrators to integrate multiple authentication schemes into a single API. Administrators must specify hive.server2.authentication=pam in the hive-site.xml configuration file to use this authentication scheme.

Custom

Authentication provided by a custom implementation of the org.apache.hive.service.auth.PasswdAuthenticationProvider interface. The implementing class must be available in the classpath for HiveServer2 and its name provided as the value of the hive.server2.custom.authentication.class property in the hive-site.xml configuration property file.

None

The Beeline client performs no authentication with HiveServer2. Administrators must specify hive.server2.authentication=none in the hive-site.xml configuration file to use this authentication scheme.


The next section describes the connection strings used to connect to HiveServer2 for all possible combinations of these modes, as well as the connection string required to connect to HiveServer2 in a secure cluster.

Connecting to Hive with Beeline

The following examples demonstrate how to use Beeline to connect to Hive for all possible variations of these modes.

Embedded Client

Use the following syntax to connect to Hive from Beeline in embedded mode:

!connect jdbc:hive2:// 

Remote Client with HiveServer2 TCP Transport Mode and SASL Authentication

While running in TCP transport mode, HiveServer2 uses the Java Simple Authentication and Security Layer (SASL) protocol to establish a security layer between the client and server. Use the following syntax to connect to HiveServer2 in TCP mode from a remote Beeline client:

!connect jdbc:hive2://<host>:<port>/<db> 

The default port for HiveServer2 in TCP mode is 10000, and db is the name of the database to which you want to connect.

Remote Client with HiveServer2 HTTP Transport Mode

Use the following syntax to connect to HiveServer2 in HTTP mode from a remote Beeline client:

!connect jdbc:hive2://<host>:<port>/<db>;hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint> 
[Note]Note

The value for <http_endpoint> can be found in the hive.server2.thrift.http.path property in the hive-site.xml file. If it is not listed there, use the default value cliservice.

In an environment that is secured by Kerberos, use the following syntax to connect to HiveServer2 in HTTP mode from a remote Beeline client:

!connect jdbc:hive2://<host>:<port>/<db>;hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>;principal=hive/HOST@REALM

Remote Client with HiveServer2 in Secure Cluster

Use the following syntax to connect to HiveServer2 in a secure cluster from a remote Beeline client:

!connect jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2> 
[Note]Note

The Beeline client must have a valid Kerberos ticket in the ticket cache before attempting to connect.