Use the Apache Thrift Proxy API
The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings.
Compile thrift libraries and generate HBase thrift Python binding libraries
A Thrift binding is client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. T use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.
After the Thrift server is configured and running, generate Thrift bindings for the language of
your choice, using an IDL file. A HBase IDL file named HBase.thrift is
included as part of HBase. After generating the bindings, copy the Thrift libraries for your
language into the same directory as the generated bindings. In the following Python example,
these libraries provide the thrift.transport
and
thrift.protocol
libraries. These commands show how you might generate the
Thrift bindings for Python and copy the libraries on a Linux system.
mkdir HBaseThrift
cd HBaseThrift/
thrift -gen py /path/to/Hbase.thrift
mv gen-py/* .
rm -rf gen-py/
mkdir thrift
cp -rp ~/Downloads/thrift/lib/py/src/* ./thrift/
HbaseThrift/
|-- hbased
| |-- constants.py
| |-- Hbase.py
| |-- Hbase-remote
| |-- __init__.py
| `-- ttypes.py
|-- __init__.py
`-- thrift
|-- compat.py
|-- ext
| |-- binary.cpp
| |-- binary.h
| |-- compact.cpp
| |-- compact.h
| |-- endian.h
| |-- module.cpp
| |-- protocol.h
| |-- protocol.tcc
| |-- types.cpp
| `-- types.h
|-- __init__.py
|-- protocol
| |-- __init__.py
| |-- TBase.py
| |-- TBinaryProtocol.py
| |-- TCompactProtocol.py
| |-- THeaderProtocol.py
| |-- TJSONProtocol.py
| |-- TMultiplexedProtocol.py
| |-- TProtocolDecorator.py
| `-- TProtocol.py
|-- server
| |-- __init__.py
| |-- THttpServer.py
| |-- TNonblockingServer.py
| |-- TProcessPoolServer.py
| `-- TServer.py
|-- Thrift.py
|-- TMultiplexedProcessor.py
|-- transport
| |-- __init__.py
| |-- sslcompat.py
| |-- THeaderTransport.py
| |-- THttpClient.py
| |-- TSocket.py
| |-- TSSLSocket.py
| |-- TTransport.py
| |-- TTwisted.py
| `-- TZlibTransport.py
|-- TRecursive.py
|-- TSCons.py
|-- TSerialization.py
`-- TTornado.py
from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase
# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
# Create and open the client connection
client = Hbase.Client(protocol)
transport.open()
# Modify a single row
mutations = [Hbase.Mutation(
column='columnfamily:columndescriptor', value='columnvalue')]
client.mutateRow('tablename', 'rowkey', mutations)
# Modify a batch of rows
# Create a list of mutations per work of Shakespeare
mutationsbatch = []
for line in myDataFile:
rowkey = username + "-" + filename + "-" + str(linenumber).zfill(6)
mutations = [
Hbase.Mutation(column=messagecolumncf, value=line.strip()),
Hbase.Mutation(column=linenumbercolumncf, value=encode(linenumber)),
Hbase.Mutation(column=usernamecolumncf, value=username)
]
mutationsbatch.append(Hbase.BatchMutation(row=rowkey,mutations=mutations))
# Run the mutations for all the lines in myDataFile
client.mutateRows(tablename, mutationsbatch)
transport.close()
The Thrift Proxy API does not support writing to HBase clusters that are secured using Kerberos.
Example codes
Choose the right class and functions along with the right configurations for HBase.
Classes and functions
- Transport level: TBufferedTransport, TFramedTransport, TSaslTransport, and THttpClient.
- Protocol level: TBinaryProtocol and TCompactProtocol.
Configurations for HBase thrift
Property | Default value (secured) | Default value (unsecured) | Description |
---|---|---|---|
hbase.thrift.support.proxyuser | true | true | Use this to allow proxy users on the thrift gateway, which is mainly needed for doAs functionality. |
hbase.regionserver.thrift.framed | true | true | Use framed transport. When using the THsHaServer or TNonblockingServer, framed transport is always used irrespective of this configuration value. |
hbase.regionserver.thrift.compact | true | true | Use the TCompactProtocol instead of the default TBinaryProtocol. TCompactProtocol is a binary protocol that is more compact than the default and typically more efficient. |
hbase.regionserver.thrift.http | true | true | Use this to enable HTTP server usage on thrift, which is mainly needed for doAs functionality. |
hbase.thrift.security.qop | auth_conf | none | If this is set, HBase Thrift Server authenticates its clients. HBase Proxy User Hosts and Groups must be configured to allow specific users to access HBase through Thrift Server. |
hbase.thrift.ssl.enabled | true | false | Encrypt communication between clients and HBase Thrift Server over HTTP using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). |
Example-1 THttpClient in Secure Cluster
Let us consider that the cluster is secured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
Before proceeding, ensure that the following applications are installed on your system.
- python3 and python3-devel
- gcc-c++
- cyrus-sasl-devel
Perform the following steps:
- Install these dependencies on a CentOS or Red Hat Enterprise Linux (RHEL) system using
the following command.
yum install python3 python3-devel gcc-c++ cyrus-sasl-devel
- Install virtualenv using pip3.
pip3 install virtualenv
- Create a new virtual environment named py3env.
virtualenv py3env
- Activate the virtual environment.
source py3env/bin/activate
- Install the required Python packages and their specific versions. Consider you are
inside the python3 virtual environment.
pip3 install kerberos==1.3.1 pure-sasl==0.6.2 setuptools==59.6.0 six==1.16.0 wheel==0.37.1
This ensures that you have all the necessary dependencies and packages installed to proceed with your project.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from subprocess import call
import ssl
import kerberos
# Replace with your own parameters
hostname = 'your_thrift_server_hostname'
key_file = 'your_key_file'
cert_file = 'your_cert_file'
ca_file='your_CA_file'
keytab = 'your_key_tab'
client_principal = 'your_client_principal'
cert_password='your_cert_password'
# Function to authenticate with Kerberos
def kerberos_auth():
call("kdestroy", shell=True)
kinit_command = "kinit -kt {} {}".format(keytab, client_principal)
call(kinit_command, shell=True)
__, krb_context = kerberos.authGSSClientInit("HTTP")
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details, 'Content-Type': 'application/binary'}
return headers
# Initializete an SSL context with certificate verification enabled
context = ssl.create_default_context()
context.load_verify_locations(ca_file)
context.load_cert_chain(certfile=cert_file, keyfile=key_file,password=cert_password)
# create a THttpClient instance with the SSL context and custom headers
httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=context)
httpClient.setCustomHeaders(headers=kerberos_auth())
# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)
Here is another example to implement SPNEGO with SSL.
# This example code assumes to run at HBase Thrift server host
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from ssl import create_default_context
import kerberos
import os
import socket
from subprocess import call
# Get the hostname, agent_cert_dir, ca_file,
def get_env_params():
hostname = socket.gethostname()
agent_cert_dir = '/var/lib/cloudera-scm-agent/agent-cert/'
ca_file = os.path.join(agent_cert_dir, 'cm-auto-global_cacerts.pem')
keytab_file = '/cdep/keytabs/hbase.keytab'
return hostname, agent_cert_dir, ca_file, keytab_file
#Check if a valid Kerberos ticket is already present in the cache
def check_kerberos_ticket():
ccache_file = os.getenv('KRB5CCNAME')
if ccache_file:
ccache = CCache.load_ccache(ccache_file)
if ccache.get_principal() and not ccache.get_principal().is_anonymous():
return True
return False
# Obtain a Kerberos ticket by running kinit from keytab
def kinit(keytab_file):
call(['kinit', '-kt', keytab_file, 'hbase'])
# Function to authenticate with Kerberos and get a SPNEGO token
def get_spnego_token():
service_name = 'HTTP@{}'.format(hostname)
result, context = kerberos.authGSSClientInit(service_name, gssflags=kerberos.GSS_C_MUTUAL_FLAG)
kerberos.authGSSClientStep(context, "")
spnego_token = kerberos.authGSSClientResponse(context)
headers = {'Authorization': 'Negotiate {}'.format(spnego_token)}
return headers
# Initialize an SSL context with certificate verification enabled
def get_ssl_context():
context = create_default_context()
context.load_verify_locations(ca_file)
return context
# Main function to create the HBase client and retrieve tables
if __name__ == '__main__':
hostname, agent_cert_dir, ca_file, keytab_file = get_env_params()
# Check if a valid Kerberos ticket is already present in the cache
if not check_kerberos_ticket():
# If a valid ticket is not present, obtain one by running kinit
kinit(keytab_file)
# Create a THttpClient instance with the SSL context and custom headers
httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=get_ssl_context())
httpClient.setCustomHeaders(headers=get_spnego_token())
# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)
Example-2 THttpClient in Unsecure Cluster
Let us consider that the cluster is unsecured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (unsecured) column.
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
# Replace with your own parameters
hostname = 'your_hbase_thrift_server_hostname'
# Initialize THttpClient
httpClient = THttpClient.THttpClient('http://' + hostname + ':9090/')
# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)
# Close connection
httpClient.close()
Example-3 TSaslClientTransport in Secure Cluster without HTTP
If you do not use THttpClient and want to use TSaslClientTransport for legacy compatibility reasons, ensure that you set hbase.regionserver.thrift.http property to false. The other settings could be same as the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.protocol import TCompactProtocol
from hbase import Hbase
'''
Assume you already kinit the hbase principal, or you can use the function in example-1 to kinit.
'''
# Replace with your own parameters
thrift_host = 'your_hbase_thrift_server_hostname'
thrift_port = 9090