Configuring the Lily HBase NRT Indexer Service for Use with Cloudera Search
The Lily HBase NRT Indexer Service is a flexible, scalable, fault-tolerant, transactional, near real-time (NRT) system for processing a continuous stream of HBase cell updates into live search indexes. Typically it takes seconds for data ingested into HBase to appear in search results; this duration is tunable. The Lily HBase Indexer uses SolrCloud to index data stored in HBase. As HBase applies inserts, updates, and deletes to HBase table cells, the indexer keeps Solr consistent with the HBase table contents, using standard HBase replication. The indexer supports flexible custom application-specific rules to extract, transform, and load HBase data into Solr. Solr search results can contain columnFamily:qualifier links back to the data stored in HBase. This way, applications can use the Search result set to directly access matching raw HBase cells. Indexing and searching do not affect operational stability or write throughput of HBase because the indexing and searching processes are separate and asynchronous to HBase.
The Lily HBase NRT Indexer Service must be deployed in an environment with a running HBase cluster, a running SolrCloud cluster, and at least one ZooKeeper cluster. This can be done with or without Cloudera Manager. See Managing Services for more information on adding services such as the Lily HBase Indexer Service.
Enabling Cluster-wide HBase Replication
The Lily HBase Indexer is implemented using HBase replication, presenting indexers as RegionServers of the worker cluster. This requires HBase replication on the HBase cluster, as well as the individual tables to be indexed. An example of settings required for configuring cluster-wide HBase replication is shown in /usr/share/doc/hbase-solr-doc*/demo/hbase-site.xml. You must add these settings to all of the hbase-site.xml configuration files on the HBase cluster, except the replication.replicationsource.implementation property. You can use the Cloudera Manager HBase Indexer Service GUI to do this. After making these updates, restart your HBase cluster.
Pointing a Lily HBase NRT Indexer Service at an HBase Cluster that Needs to Be Indexed
Before starting Lily HBase NRT Indexer services, you must configure individual services with the location of a ZooKeeper ensemble that is used for the target HBase cluster. Add the following property to /etc/hbase-solr/conf/hbase-indexer-site.xml. Remember to replace hbase-cluster-zookeeper with the actual ensemble string found in the hbase-site.xml configuration file:
<property> <name>hbase.zookeeper.quorum</name> <value>hbase-cluster-zookeeper</value> </property>
Configure all Lily HBase NRT Indexer Services to use a particular ZooKeeper ensemble to coordinate with one another. Add the following property to /etc/hbase-solr/conf/hbase-indexer-site.xml, and replace hbase-cluster-zookeeper:2181 with the actual ensemble string:
<property> <name>hbaseindexer.zookeeper.connectstring</name> <value>hbase-cluster-zookeeper:2181</value> </property>
Configuring Lily HBase Indexer Security
Beginning with CDH 5.4 the Lily HBase Indexer includes an HTTP interface for the list-indexers, create-indexer, update-indexer, and delete-indexer commands. This interface can be configured to use Kerberos and to integrate with Sentry.
Configuring Lily HBase Indexer to Use Security
To configure the Lily HBase Indexer to use security, you must create principals and keytabs and then modify default configurations.
To create principals and keytabs
Repeat this process on all Lily HBase Indexer hosts.
- Create a Lily HBase Indexer service user principal using the syntax: hbase/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate with the Hadoop cluster. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.
$ kadmin kadmin: addprinc -randkey hbase/fully.qualified.domain.name@YOUR-REALM.COM
- Create a HTTP service user principal using the syntax: HTTP/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate user requests coming to the Lily HBase Indexer web-services. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.
kadmin: addprinc -randkey HTTP/fully.qualified.domain.name@YOUR-REALM.COM
- Create keytab files with both principals.
kadmin: xst -norandkey -k hbase.keytab hbase/fully.qualified.domain.name \ HTTP/fully.qualified.domain.name
- Test that credentials in the merged keytab file work. For example:
$ klist -e -k -t hbase.keytab
- Copy the hbase.keytab file to the Lily HBase Indexer configuration directory. The owner of the hbase.keytab file should be the hbase user and the file should have owner-only read permissions.
To modify default configurations
Repeat this process on all Lily HBase Indexer hosts.
- Modify the hbase-indexer-site.xml file as follows:
<property> <name>hbaseindexer.authentication.type</name> <value>kerberos</value> </property> <property> <name>hbaseindexer.authentication.kerberos.keytab</name> <value>hbase.keytab</value> </property> <property> <name>hbaseindexer.authentication.kerberos.principal</name> <value>HTTP/localhost@LOCALHOST</value> </property> <property> <name>hbaseindexer.authentication.kerberos.name.rules</name> <value>DEFAULT</value> </property>
- Set up the Java Authentication and Authorization Service (JAAS). Create a jaas.conf file in the HBase-Indexer
configuration directory containing the following settings. Make sure that you substitute a value for principal that matches your particular environment.
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=false keyTab="/etc/hbase/conf/hbase.keytab" principal="hbase/fully.qualified.domain.name@<YOUR-REALM>"; };
Then, modify hbase-indexer-env.sh in the hbase-indexer configuration directory to add the jaas configuration to the system properties. You can do this by adding -Djava.security.auth.login.config to the HBASE_INDEXER_OPTS. For example, you might add the following:HBASE_INDEXER_OPTS = "$HBASE_INDEXER_OPTS -Djava.security.auth.login.config=/path/to/your/jaas.conf"
Configuring Clients to Use the HTTP Interface
By default, the client does not use the new HTTP interface. Use the HTTP interface only if you want to take advantage of one of the features it provides, such as Kerberos authentication and Sentry integration. The client now supports passing two additional parameters to the list-indexers, create-indexer, delete-indexer, and update-indexer commands:
- --http: An HTTP URI to the hbase-indexer HTTP API. By default, this URI is of the form http://host:11060/indexer/. If this URI is passed, the Lily HBase Indexer uses the HTTP API. If this URI is not passed, the indexer uses the old behavior of communicating directly with ZooKeeper.
- --jaas: The specification of a jaas configuration file. This is only necessary for Kerberos-enabled deployments.
For example:
hbase-indexer --http http://host:port/indexer/ --jaas jaas.conf list-indexers
Sentry integration
The Lily HBase Indexer uses a file-based access control model similar to that provided by Solr-Sentry integration, which is described in Enabling Sentry Authorization for Search using the Command Line. The model supports specifying READ and WRITE privileges for each indexer. The privileges work as follows:
- If role has WRITE privilege for indexer1, a call to create, update, or delete indexer1 succeeds.
- If role has READ privilege for indexer1, a call to list-indexers will list indexer1, if it exists. If an indexer called indexer2 exists, but the role doesn't have READ privileges for it, information about indexer2 is filtered out of the response.
<property> <name>sentry.hbaseindexer.sentry.site</name> <value>sentry-site.xml</value> (full or relative path) </property> <property> <name>hbaseindexer.rest.resource.package</name> <value>org/apache/sentry/binding/hbaseindexer/rest</value> </property>
Starting a Lily HBase NRT Indexer Service
You can use the Cloudera Manager GUI to start Lily HBase NRT Indexer Service on a set of machines. In non-managed deployments, you can start a Lily HBase Indexer Daemon manually on the local host with the following command:
sudo service hbase-solr-indexer restart
After starting the Lily HBase NRT Indexer Services, verify that all daemons are running using the jps tool from the Oracle JDK, which you can obtain from the Java SE Downloads page. If you are running a pseudo-distributed HDFS installation and a Lily HBase NRT Indexer Service installation on one machine, jps shows the following output:
$ sudo jps -lm 31407 sun.tools.jps.Jps -lm 26393 com.ngdata.hbaseindexer.Main