Use the Apache Thrift Proxy API
The Apache Thrift library provides cross-language client-server remote procedure calls (RPCs), using Thrift bindings.
Prepare Thrift server and client before using Thrift Proxy API
A Thrift binding is client code generated by the Apache Thrift Compiler for a target language (such as Python) that allows communication between the Thrift server and clients using that client code. HBase includes an Apache Thrift Proxy API, which allows you to write HBase applications in Python, C, C++, or another language that Thrift supports. The Thrift Proxy API is slower than the Java API and may have fewer features. To use the Thrift Proxy API, you need to configure and run the HBase Thrift server on your cluster. You also need to install the Apache Thrift compiler on your development system.
After the Thrift server is configured and running, generate Thrift bindings for the
language of your choice, using an IDL file. An HBase IDL file named
HBase.thrift is included as part of HBase. After generating the
bindings, copy the Thrift libraries for your language into the same directory as the
generated bindings. In the following Python example, these libraries provide the
thrift.transport
and thrift.protocol
libraries. These
commands show how you might generate the Thrift bindings for Python and copy
the libraries on a Linux system.
mkdir HBaseThrift cd HBaseThrift/ thrift -gen py /path/to/Hbase.thrift mv gen-py/* . rm -rf gen-py/ mkdir thrift cp -rp ~/Downloads/thrift/lib/py/src/* ./thrift/
HbaseThrift/ |-- hbased | |-- constants.py | |-- Hbase.py | |-- Hbase-remote | |-- __init__.py | `-- ttypes.py |-- __init__.py `-- thrift |-- compat.py |-- ext | |-- binary.cpp | |-- binary.h | |-- compact.cpp | |-- compact.h | |-- endian.h | |-- module.cpp | |-- protocol.h | |-- protocol.tcc | |-- types.cpp | `-- types.h |-- __init__.py |-- protocol | |-- __init__.py | |-- TBase.py | |-- TBinaryProtocol.py | |-- TCompactProtocol.py | |-- THeaderProtocol.py | |-- TJSONProtocol.py | |-- TMultiplexedProtocol.py | |-- TProtocolDecorator.py | `-- TProtocol.py |-- server | |-- __init__.py | |-- THttpServer.py | |-- TNonblockingServer.py | |-- TProcessPoolServer.py | `-- TServer.py |-- Thrift.py |-- TMultiplexedProcessor.py |-- transport | |-- __init__.py | |-- sslcompat.py | |-- THeaderTransport.py | |-- THttpClient.py | |-- TSocket.py | |-- TSSLSocket.py | |-- TTransport.py | |-- TTwisted.py | `-- TZlibTransport.py |-- TRecursive.py |-- TSCons.py |-- TSerialization.py `-- TTornado.py
from thrift.transport import TSocket from thrift.protocol import TBinaryProtocol from thrift.transport import TTransport from hbase import Hbase # Connect to HBase Thrift server transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port)) protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) # Create and open the client connection client = Hbase.Client(protocol) transport.open() # Modify a single row mutations = [Hbase.Mutation( column='columnfamily:columndescriptor', value='columnvalue')] client.mutateRow('tablename', 'rowkey', mutations) # Modify a batch of rows # Create a list of mutations per work of Shakespeare mutationsbatch = [] for line in myDataFile: rowkey = username + "-" + filename + "-" + str(linenumber).zfill(6) mutations = [ Hbase.Mutation(column=messagecolumncf, value=line.strip()), Hbase.Mutation(column=linenumbercolumncf, value=encode(linenumber)), Hbase.Mutation(column=usernamecolumncf, value=username) ] mutationsbatch.append(Hbase.BatchMutation(row=rowkey,mutations=mutations)) # Run the mutations for all the lines in myDataFile client.mutateRows(tablename, mutationsbatch) transport.close()
The Thrift Proxy API does not support writing to HBase clusters that are secured using Kerberos.
Example codes
Choose the right class and functions along with the right configurations for HBase.
Classes and functions
- Transport level: TBufferedTransport, TFramedTransport, TSaslTransport, and THttpClient.
- Protocol level: TBinaryProtocol and TCompactProtocol.
Configurations for HBase thrift
Property | Default value (secured) | Default value (unsecured) | Description |
---|---|---|---|
hbase.thrift.support.proxyuser | true | true | Use this to allow proxy users on the thrift gateway, which is mainly needed for doAs functionality. |
hbase.regionserver.thrift.framed | true | true | Use framed transport. When using the THsHaServer or TNonblockingServer, framed transport is always used irrespective of this configuration value. |
hbase.regionserver.thrift.compact | true | true | Use the TCompactProtocol instead of the default TBinaryProtocol. TCompactProtocol is a binary protocol that is more compact than the default and typically more efficient. |
hbase.regionserver.thrift.http | true | true | Use this to enable HTTP server usage on thrift, which is mainly needed for doAs functionality. |
hbase.thrift.security.qop | auth_conf | none | If this is set, HBase Thrift Server authenticates its clients. HBase Proxy User Hosts and Groups must be configured to allow specific users to access HBase through Thrift Server. |
hbase.thrift.ssl.enabled | true | false | Encrypt communication between clients and HBase Thrift Server over HTTP using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). |
Example-1 THttpClient in Secure Cluster
Let us consider that the cluster is secured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
Before proceeding, ensure that the following applications are installed on your system.
- python3 and python3-devel
- gcc-c++
- cyrus-sasl-devel
Perform the following steps:
- Install these dependencies on a CentOS or Red Hat Enterprise Linux (RHEL) system using
the following command.
yum install python3 python3-devel gcc-c++ cyrus-sasl-devel
- Install virtualenv using pip3.
pip3 install virtualenv
- Create a new virtual environment named py3env.
virtualenv py3env
- Activate the virtual environment.
source py3env/bin/activate
- Install the required Python packages and their specific versions. Consider you are
inside the python3 virtual environment.
pip3 install kerberos==1.3.1 pure-sasl==0.6.2 setuptools==59.6.0 six==1.16.0 wheel==0.37.1
This ensures that you have all the necessary dependencies and packages installed to proceed with your project.
from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol from hbase.Hbase import Client from subprocess import call import ssl import kerberos # Replace with your own parameters hostname = 'your_thrift_server_hostname' key_file = 'your_key_file' cert_file = 'your_cert_file' ca_file='your_CA_file' keytab = 'your_key_tab' client_principal = 'your_client_principal' cert_password='your_cert_password' # Function to authenticate with Kerberos def kerberos_auth(): call("kdestroy", shell=True) kinit_command = "kinit -kt {} {}".format(keytab, client_principal) call(kinit_command, shell=True) __, krb_context = kerberos.authGSSClientInit("HTTP") kerberos.authGSSClientStep(krb_context, "") negotiate_details = kerberos.authGSSClientResponse(krb_context) headers = {'Authorization': 'Negotiate ' + negotiate_details, 'Content-Type': 'application/binary'} return headers # Initializete an SSL context with certificate verification enabled context = ssl.create_default_context() context.load_verify_locations(ca_file) context.load_cert_chain(certfile=cert_file, keyfile=key_file,password=cert_password) # create a THttpClient instance with the SSL context and custom headers httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=context) httpClient.setCustomHeaders(headers=kerberos_auth()) # Initialize TBinaryProtocol with THttpClient protocol = TBinaryProtocol.TBinaryProtocol(httpClient) # Create HBase client client = Client(protocol) # Retrieve list of HBase tables tables = client.getTableNames() print(tables)
Here is another example to implement SPNEGO with SSL.
# This example code assumes to run at HBase Thrift server host from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol from hbase.Hbase import Client from ssl import create_default_context import kerberos import os import socket from subprocess import call # Get the env parameters def get_env_params(): # Replace with your own parameters hostname='your_hbase_thrift_hostname' cert_file="your_cert_file" key_file="your_key_file" ca_file="your_ca_file" key_pw='your_key_pw' keytab_file='your_keytab' principal = 'your_principal' return hostname,cert_file,key_file,ca_file,keytab_file,principal,key_pw #Check if a valid Kerberos ticket is already present in the cache def check_kerberos_ticket(): ccache_file = os.getenv('KRB5CCNAME') if ccache_file: ccache = CCache.load_ccache(ccache_file) if ccache.get_principal() and not ccache.get_principal().is_anonymous(): return True return False # Obtain a Kerberos ticket by running kinit from keytab def kinit(keytab_file): call(['kinit', '-kt', keytab_file, 'hbase']) # Function to authenticate with Kerberos and get a SPNEGO token def get_spnego_token(): service_name = 'HTTP@{}'.format(hostname) result, context = kerberos.authGSSClientInit(service_name, gssflags=kerberos.GSS_C_MUTUAL_FLAG) kerberos.authGSSClientStep(context, "") spnego_token = kerberos.authGSSClientResponse(context) headers = {'Authorization': 'Negotiate {}'.format(spnego_token)} return headers # Initialize an SSL context with certificate verification enabled def get_ssl_context(): context = create_default_context() context.load_verify_locations(ca_file) return context # Main function to create the HBase client and retrieve tables if __name__ == '__main__': hostname, agent_cert_dir, ca_file, keytab_file = get_env_params() # Check if a valid Kerberos ticket is already present in the cache if not check_kerberos_ticket(): # If a valid ticket is not present, obtain one by running kinit kinit(keytab_file) # Create a THttpClient instance with the SSL context and custom headers httpClient = THttpClient.THttpClient('https://' + hostname + ':9090/', ssl_context=get_ssl_context()) httpClient.setCustomHeaders(headers=get_spnego_token()) # Initialize TBinaryProtocol with THttpClient protocol = TBinaryProtocol.TBinaryProtocol(httpClient) # Create HBase client client = Client(protocol) # Retrieve list of HBase tables tables = client.getTableNames() print(tables) # Close connection httpClient.close()
Example-2 THttpClient in Unsecure Cluster
Let us consider that the cluster is unsecured with the configuration properties mentioned in the HBase thrift configurations table under the Default value (unsecured) column.
from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol from hbase.Hbase import Client # Replace with your own parameters hostname = 'your_hbase_thrift_server_hostname' # Initialize THttpClient httpClient = THttpClient.THttpClient('http://' + hostname + ':9090/') # Initialize TBinaryProtocol with THttpClient protocol = TBinaryProtocol.TBinaryProtocol(httpClient) # Create HBase client client = Client(protocol) # Retrieve list of HBase tables tables = client.getTableNames() print(tables) # Close connection httpClient.close()
Example-3 TSaslClientTransport in Secure Cluster without HTTP
If you do not use THttpClient and want to use TSaslClientTransport for legacy compatibility reasons, ensure that you set hbase.regionserver.thrift.http property to false. The other settings could be same as the configuration properties mentioned in the HBase thrift configurations table under the Default value (secured) column.
from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol from thrift.protocol import TCompactProtocol from hbase import Hbase ''' Assume you already kinit the hbase principal, or you can use the function in example-1 to kinit. ''' # Replace with your own parameters thrift_host = 'your_hbase_thrift_server_hostname' thrift_port = 9090