Use the Apache Thrift Proxy API
The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings.
Introduction to the compilation of the thrift libraries and generation of the HBase thrift Python binding libraries
A Thrift binding is client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.
After the Thrift server is configured and running, generate Thrift bindings for the language of
your choice, using an IDL file. An HBase IDL file named HBase.thrift is
included as part of HBase. After generating the bindings, copy the Thrift libraries for your
language into the same directory as the generated bindings. In the following Python example,
these libraries provide the thrift.transport
and
thrift.protocol
libraries. These commands show how you might generate the
Thrift bindings for Python and copy the libraries on a Linux system.
mkdir HBaseThrift
cd HBaseThrift/
thrift -gen py /path/to/Hbase.thrift
mv gen-py/* .
rm -rf gen-py/
mkdir thrift
cp -rp ~/Downloads/thrift/lib/py/src/* ./thrift/
HbaseThrift/
|-- hbased
| |-- constants.py
| |-- Hbase.py
| |-- Hbase-remote
| |-- __init__.py
| `-- ttypes.py
|-- __init__.py
`-- thrift
|-- compat.py
|-- ext
| |-- binary.cpp
| |-- binary.h
| |-- compact.cpp
| |-- compact.h
| |-- endian.h
| |-- module.cpp
| |-- protocol.h
| |-- protocol.tcc
| |-- types.cpp
| `-- types.h
|-- __init__.py
|-- protocol
| |-- __init__.py
| |-- TBase.py
| |-- TBinaryProtocol.py
| |-- TCompactProtocol.py
| |-- THeaderProtocol.py
| |-- TJSONProtocol.py
| |-- TMultiplexedProtocol.py
| |-- TProtocolDecorator.py
| `-- TProtocol.py
|-- server
| |-- __init__.py
| |-- THttpServer.py
| |-- TNonblockingServer.py
| |-- TProcessPoolServer.py
| `-- TServer.py
|-- Thrift.py
|-- TMultiplexedProcessor.py
|-- transport
| |-- __init__.py
| |-- sslcompat.py
| |-- THeaderTransport.py
| |-- THttpClient.py
| |-- TSocket.py
| |-- TSSLSocket.py
| |-- TTransport.py
| |-- TTwisted.py
| `-- TZlibTransport.py
|-- TRecursive.py
|-- TSCons.py
|-- TSerialization.py
`-- TTornado.py
Introduction to example codes
Choose the right class and functions along with the right configurations for HBase.
Classes and functions
- Transport level: TBufferedTransport, TFramedTransport, TSaslTransport, and THttpClient.
- Protocol level: TBinaryProtocol and TCompactProtocol.
Configurations for HBase thrift
Property | Default value (secured) | Default value (unsecured) | Description |
---|---|---|---|
hbase.thrift.support.proxyuser | true | true | Use this to allow proxy users on the thrift gateway, which is mainly needed for doAs functionality. |
hbase.regionserver.thrift.framed | true | true | Use framed transport. When using the THsHaServer or TNonblockingServer, framed transport is always used irrespective of this configuration value. |
hbase.regionserver.thrift.compact | true | true | Use the TCompactProtocol instead of the default TBinaryProtocol. TCompactProtocol is a binary protocol that is more compact than the default and typically more efficient. |
hbase.regionserver.thrift.http | true | true | Use this to enable HTTP server usage on thrift, which is mainly needed for doAs functionality. |
hbase.thrift.security.qop | auth_conf | none | If this is set, HBase Thrift Server authenticates its clients. HBase Proxy User Hosts and Groups must be configured to allow specific users to access HBase through Thrift Server. |
hbase.thrift.ssl.enabled | true | false | Encrypt communication between clients and HBase Thrift Server over HTTP using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). |
Example-1 THttpClient in Secure Cluster
Let us consider that the cluster is secured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
Before proceeding, ensure that the following applications are installed on your system.
- python 3.6.8 and python 3-devel
- pip 21.3.1
- virtualenv 20.17.1
Perform the following steps:
- Install virtualenv using pip3.
pip3 install virtualenv
- Create a new virtual environment named py3env.
virtualenv py3env
- Activate the virtual environment.
source py3env/bin/activate
- Install the required Python packages and their specific versions. Consider you are
inside the python3 virtual environment.
pip3 install kerberos==1.3.1 pure-sasl==0.6.2 setuptools==59.6.0 six==1.16.0 wheel==0.37.1
This ensures that you have all the necessary dependencies and packages installed to proceed with your project.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from subprocess import call
import ssl
import kerberos
import os
# Get the hostname, agent_cert_dir, ca_file,
def get_env_params():
# Replace with your own parameters
hostname='your_hbase_thrift_hostname'
cert_file="your_cert_file"
key_file="your_key_file"
ca_file="your_ca_file"
key_pw='your_key_pw'
keytab_file='your_keytab'
principal = 'your_principal'
return hostname,cert_file,key_file,ca_file,keytab_file,principal,key_pw
#Check if a valid Kerberos ticket is already present in the cache
def check_kerberos_ticket():
ccache_file = os.getenv('KRB5CCNAME')
if ccache_file:
ccache = CCache.load_ccache(ccache_file)
if ccache.get_principal() and not ccache.get_principal().is_anonymous():
return True
return False
# Obtain a Kerberos ticket by running kinit from keytab
def kinit(keytab_file,principal):
call(['kinit', '-kt', keytab_file, principal])
# Authenticate with Kerberos
def kerberos_auth():
__, krb_context = kerberos.authGSSClientInit("HTTP")
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details, 'Content-Type': 'application/binary'}
return headers
# Initializete an SSL context with certificate verification enabled
def get_ssl_context():
ssl_context = ssl.create_default_context()
ssl_context.load_cert_chain(certfile=cert_file,keyfile=key_file,password=key_pw)
ssl_context.load_verify_locations(cafile=ca_file)
return ssl_context
if __name__ == '__main__':
hostname,cert_file,key_file,ca_file,keytab_file,principal,key_pw=get_env_params()
# Check if a valid Kerberos ticket is not in the cache, then kinit.
if not check_kerberos_ticket():
kinit(keytab_file,principal)
# create a THttpClient instance with the SSL context and custom headers
httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=get_ssl_context())
httpClient.setCustomHeaders(headers=kerberos_auth())
# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)
# Close connection
httpClient.close()
Example-2 THttpClient in Unsecure Cluster
Let us consider that the cluster is unsecured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (unsecured) column.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
# Replace with your own parameters
hostname = 'your_hbase_thrift_server_hostname'
# Initialize THttpClient
httpClient = THttpClient.THttpClient('http://' + hostname + ':9090/')
# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)
# Close connection
httpClient.close()
Example-3 TSaslClientTransport in Secure Cluster without HTTP
If you do not use THttpClient and want to use TSaslClientTransport for legacy compatibility reasons, ensure that you set hbase.regionserver.thrift.http property to false. The other settings could be same as the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.protocol import TCompactProtocol
from hbase import Hbase
'''
Assume you already kinit the hbase principal, or you can use the function in example-1 to kinit.
'''
# Replace with your own parameters
thrift_host = 'your_hbase_thrift_server_hostname'
thrift_port = 9090
# Initialize TSocket and TTransport
socket = TSocket.TSocket(thrift_host, thrift_port)
transport=TTransport.TSaslClientTransport(socket,host=thrift_host,service='hbase',mechanism='GSSAPI')
# Initialize TCompactProtocol with TTransport
protocol = TCompactProtocol.TCompactProtocol(transport)
# Create HBase client
client = Hbase.Client(protocol)
# Open connection and retrieve list of HBase tables
transport.open()
tables = client.getTableNames()
print(tables)
# Close connection
transport.close()
Cloudera recommends you to use the HTTP options (Example-1 and Example-2). You can consider the Example-3 for legacy compatibility issues where some old applications might not rewrite the codes. This is because Hue is using HTTP mode to interact with HBase thrift, and if you disable the HTTP mode, Hue might not work properly with HBase.
Known bugs while using TSaslClientTransport with Kerberos enabled CDP versions
Upstream JIRA HBASE-21652, where a bug is introduced related to Kerberos principal handling. The affected versions are CDP 7.1.6 and earlier. The versions containing the fix are 7.1.7, 7.2.11, and later.