Accessing Apache HBasePDF version

Use the Apache Thrift Proxy API

The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings.

A Thrift binding is client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.

After the Thrift server is configured and running, generate Thrift bindings for the language of your choice, using an IDL file. An HBase IDL file named HBase.thrift is included as part of HBase. After generating the bindings, copy the Thrift libraries for your language into the same directory as the generated bindings. In the following Python example, these libraries provide the thrift.transport and thrift.protocol libraries. These commands show how you might generate the Thrift bindings for Python and copy the libraries on a Linux system.

After installation of the thrift compiler, verify that the thrift compiler version is newer than 0.9.0 by running the thrift -version command. You need to find the Hbase.thrift file from the HBase node or copy it to co-locate with the Thrift compiler. Perform the following steps:
mkdir HBaseThrift
cd HBaseThrift/
thrift -gen py /path/to/Hbase.thrift
mv gen-py/* .
rm -rf gen-py/
mkdir thrift
cp -rp ~/Downloads/thrift/lib/py/src/* ./thrift/
As a result, the HBase thrift Python bindings appears as follows:
HbaseThrift/
|-- hbased
|   |-- constants.py
|   |-- Hbase.py
|   |-- Hbase-remote
|   |-- __init__.py
|   `-- ttypes.py
|-- __init__.py
`-- thrift
    |-- compat.py
    |-- ext
    |   |-- binary.cpp
    |   |-- binary.h
    |   |-- compact.cpp
    |   |-- compact.h
    |   |-- endian.h
    |   |-- module.cpp
    |   |-- protocol.h
    |   |-- protocol.tcc
    |   |-- types.cpp
    |   `-- types.h
    |-- __init__.py
    |-- protocol
    |   |-- __init__.py
    |   |-- TBase.py
    |   |-- TBinaryProtocol.py
    |   |-- TCompactProtocol.py
    |   |-- THeaderProtocol.py
    |   |-- TJSONProtocol.py
    |   |-- TMultiplexedProtocol.py
    |   |-- TProtocolDecorator.py
    |   `-- TProtocol.py
    |-- server
    |   |-- __init__.py
    |   |-- THttpServer.py
    |   |-- TNonblockingServer.py
    |   |-- TProcessPoolServer.py
    |   `-- TServer.py
    |-- Thrift.py
    |-- TMultiplexedProcessor.py
    |-- transport
    |   |-- __init__.py
    |   |-- sslcompat.py
    |   |-- THeaderTransport.py
    |   |-- THttpClient.py
    |   |-- TSocket.py
    |   |-- TSSLSocket.py
    |   |-- TTransport.py
    |   |-- TTwisted.py
    |   `-- TZlibTransport.py
    |-- TRecursive.py
    |-- TSCons.py
    |-- TSerialization.py
    `-- TTornado.py
The following example shows a simple Python application using the Thrift Proxy API.
from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase

# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)

# Create and open the client connection
client = Hbase.Client(protocol)
transport.open()

# Modify a single row
mutations = [Hbase.Mutation(
  column='columnfamily:columndescriptor', value='columnvalue')]
client.mutateRow('tablename', 'rowkey', mutations)

# Modify a batch of rows
# Create a list of mutations per work of Shakespeare
mutationsbatch = []

for line in myDataFile:
    rowkey = username + "-" + filename + "-" + str(linenumber).zfill(6)

    mutations = [
            Hbase.Mutation(column=messagecolumncf, value=line.strip()),
            Hbase.Mutation(column=linenumbercolumncf, value=encode(linenumber)),
            Hbase.Mutation(column=usernamecolumncf, value=username)
        ]

       mutationsbatch.append(Hbase.BatchMutation(row=rowkey,mutations=mutations))

# Run the mutations for all the lines in myDataFile
client.mutateRows(tablename, mutationsbatch)

transport.close()

The Thrift Proxy API does not support writing to HBase clusters that are secured using Kerberos.

Choose the right class and functions along with the right configurations for HBase.

Classes and functions

  • Transport level: TBufferedTransport, TFramedTransport, TSaslTransport, and THttpClient.
  • Protocol level: TBinaryProtocol and TCompactProtocol.

Configurations for HBase thrift

HBase thrift configurations
Property Default value (secured) Default value (unsecured) Description
hbase.thrift.support.proxyuser true true Use this to allow proxy users on the thrift gateway, which is mainly needed for doAs functionality.
hbase.regionserver.thrift.framed true true Use framed transport. When using the THsHaServer or TNonblockingServer, framed transport is always used irrespective of this configuration value.
hbase.regionserver.thrift.compact true true Use the TCompactProtocol instead of the default TBinaryProtocol. TCompactProtocol is a binary protocol that is more compact than the default and typically more efficient.
hbase.regionserver.thrift.http true true Use this to enable HTTP server usage on thrift, which is mainly needed for doAs functionality.
hbase.thrift.security.qop auth_conf none If this is set, HBase Thrift Server authenticates its clients. HBase Proxy User Hosts and Groups must be configured to allow specific users to access HBase through Thrift Server.
hbase.thrift.ssl.enabled true false Encrypt communication between clients and HBase Thrift Server over HTTP using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)).

Let us consider that the cluster is secured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.

Before proceeding, ensure that the following applications are installed on your system.

  • python3 and python3-devel
  • gcc-c++
  • cyrus-sasl-devel

Perform the following steps:

  1. Install these dependencies on a CentOS or Red Hat Enterprise Linux (RHEL) system using the following command.
    yum install python3 python3-devel gcc-c++ cyrus-sasl-devel
  2. Install virtualenv using pip3.
    pip3 install virtualenv
  3. Create a new virtual environment named py3env.
    virtualenv py3env
  4. Activate the virtual environment.
    source py3env/bin/activate
  5. Install the required Python packages and their specific versions. Consider you are inside the python3 virtual environment.
    pip3 install kerberos==1.3.1 pure-sasl==0.6.2 setuptools==59.6.0 six==1.16.0 wheel==0.37.1

This ensures that you have all the necessary dependencies and packages installed to proceed with your project.

from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from subprocess import call
import ssl
import kerberos

# Replace with your own parameters
hostname = 'your_thrift_server_hostname'
key_file = 'your_key_file'
cert_file = 'your_cert_file'
ca_file='your_CA_file'
keytab = 'your_key_tab'
client_principal = 'your_client_principal' 
cert_password='your_cert_password'

# Function to authenticate with Kerberos
def kerberos_auth():
    call("kdestroy", shell=True)
    kinit_command = "kinit -kt {} {}".format(keytab, client_principal)
    call(kinit_command, shell=True)
    __, krb_context = kerberos.authGSSClientInit("HTTP")
    kerberos.authGSSClientStep(krb_context, "")
    negotiate_details = kerberos.authGSSClientResponse(krb_context)
    headers = {'Authorization': 'Negotiate ' + negotiate_details, 'Content-Type': 'application/binary'}
    return headers

# Initializete an SSL context with certificate verification enabled
context = ssl.create_default_context()
context.load_verify_locations(ca_file)
context.load_cert_chain(certfile=cert_file, keyfile=key_file,password=cert_password)

# create a THttpClient instance with the SSL context and custom headers
httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=context)
httpClient.setCustomHeaders(headers=kerberos_auth())

# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)

# Create HBase client
client = Client(protocol)
# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)

Here is another example to implement SPNEGO with SSL.

# This example code assumes to run at HBase Thrift server host
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
from ssl import create_default_context
import kerberos
import os
import socket
from subprocess import call

# Get the env parameters
def get_env_params():
    # Replace with your own parameters
    hostname='your_hbase_thrift_hostname'
    cert_file="your_cert_file"
    key_file="your_key_file"
    ca_file="your_ca_file"
    key_pw='your_key_pw'
    keytab_file='your_keytab'
    principal = 'your_principal'
    return hostname,cert_file,key_file,ca_file,keytab_file,principal,key_pw

#Check if a valid Kerberos ticket is already present in the cache
def check_kerberos_ticket():
    ccache_file = os.getenv('KRB5CCNAME')
    if ccache_file:
        ccache = CCache.load_ccache(ccache_file)
        if ccache.get_principal() and not ccache.get_principal().is_anonymous():
            return True
    return False

# Obtain a Kerberos ticket by running kinit from keytab
def kinit(keytab_file):
    call(['kinit', '-kt', keytab_file, 'hbase'])

# Function to authenticate with Kerberos and get a SPNEGO token
def get_spnego_token():
    service_name = 'HTTP@{}'.format(hostname)
    result, context = kerberos.authGSSClientInit(service_name, gssflags=kerberos.GSS_C_MUTUAL_FLAG)
    kerberos.authGSSClientStep(context, "")
    spnego_token = kerberos.authGSSClientResponse(context)
    headers = {'Authorization': 'Negotiate {}'.format(spnego_token)}
    return headers

# Initialize an SSL context with certificate verification enabled
def get_ssl_context():
    context = create_default_context()
    context.load_verify_locations(ca_file)
    return context

# Main function to create the HBase client and retrieve tables
if __name__ == '__main__':
    hostname, agent_cert_dir, ca_file, keytab_file = get_env_params()

# Check if a valid Kerberos ticket is already present in the cache
    if not check_kerberos_ticket():
    # If a valid ticket is not present, obtain one by running kinit
        kinit(keytab_file)

    # Create a THttpClient instance with the SSL context and custom headers
    httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=get_ssl_context())
    httpClient.setCustomHeaders(headers=get_spnego_token())

    # Initialize TBinaryProtocol with THttpClient
    protocol = TBinaryProtocol.TBinaryProtocol(httpClient)

    # Create HBase client
    client = Client(protocol)

    # Retrieve list of HBase tables
    tables = client.getTableNames()
    print(tables)
# Close connection
    httpClient.close()

Let us consider that the cluster is unsecured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (unsecured) column.

from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client

# Replace with your own parameters
hostname = 'your_hbase_thrift_server_hostname'

# Initialize THttpClient
httpClient = THttpClient.THttpClient('http://' + hostname + ':9090/')

# Initialize TBinaryProtocol with THttpClient
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)

# Create HBase client
client = Client(protocol)

# Retrieve list of HBase tables
tables = client.getTableNames()
print(tables)

# Close connection
httpClient.close()

If you do not use THttpClient and want to use TSaslClientTransport for legacy compatibility reasons, ensure that you set hbase.regionserver.thrift.http property to false. The other settings could be same as the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.

from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.protocol import TCompactProtocol
from hbase import Hbase

'''
Assume you already kinit the hbase principal, or you can use the function in example-1 to kinit.
'''

# Replace with your own parameters
thrift_host = 'your_hbase_thrift_server_hostname'
thrift_port = 9090