Configuring the Lily HBase NRT Indexer Service for Use with Cloudera Search

The Lily HBase NRT Indexer Service is a flexible, scalable, fault-tolerant, transactional, near real-time (NRT) system for processing a continuous stream of HBase cell updates into live search indexes. Typically it takes seconds for data ingested into HBase to appear in search results; this duration is tunable. The Lily HBase Indexer uses SolrCloud to index data stored in HBase. As HBase applies inserts, updates, and deletes to HBase table cells, the indexer keeps Solr consistent with the HBase table contents, using standard HBase replication. The indexer supports flexible custom application-specific rules to extract, transform, and load HBase data into Solr. Solr search results can contain columnFamily:qualifier links back to the data stored in HBase. This way, applications can use the Search result set to directly access matching raw HBase cells. Indexing and searching do not affect operational stability or write throughput of HBase because the indexing and searching processes are separate and asynchronous to HBase.

The Lily HBase NRT Indexer Service must be deployed in an environment with a running HBase cluster, a running SolrCloud cluster, and at least one ZooKeeper cluster. This can be done with or without Cloudera Manager. See Managing Services for more information on adding services such as the Lily HBase Indexer Service.

Enabling Cluster-wide HBase Replication

The Lily HBase Indexer is implemented using HBase replication, presenting indexers as RegionServers of the worker cluster. This requires HBase replication on the HBase cluster, as well as the individual tables to be indexed. An example of settings required for configuring cluster-wide HBase replication is shown in /usr/share/doc/hbase-solr-doc*/demo/hbase-site.xml. You must add these settings to all of the hbase-site.xml configuration files on the HBase cluster, except the replication.replicationsource.implementation property. You can use the Cloudera Manager HBase Indexer Service GUI to do this. After making these updates, restart your HBase cluster.

Pointing a Lily HBase NRT Indexer Service at an HBase Cluster that Needs to Be Indexed

Before starting Lily HBase NRT Indexer services, you must configure individual services with the location of a ZooKeeper ensemble that is used for the target HBase cluster. Add the following property to /etc/hbase-solr/conf/hbase-indexer-site.xml. Remember to replace hbase-cluster-zookeeper with the actual ensemble string found in the hbase-site.xml configuration file:

<property>
   <name>hbase.zookeeper.quorum</name>
   <value>hbase-cluster-zookeeper</value>
</property> 

Configure all Lily HBase NRT Indexer Services to use a particular ZooKeeper ensemble to coordinate with one another. Add the following property to /etc/hbase-solr/conf/hbase-indexer-site.xml, and replace hbase-cluster-zookeeper:2181 with the actual ensemble string:

<property>
   <name>hbaseindexer.zookeeper.connectstring</name>
   <value>hbase-cluster-zookeeper:2181</value>
</property> 

Configuring Lily HBase Indexer Security

Beginning with CDH 5.4 the Lily HBase Indexer includes an HTTP interface for the list-indexers, create-indexer, update-indexer, and delete-indexer commands. This interface can be configured to use Kerberos and to integrate with Sentry.

Configuring Lily HBase Indexer to Use Security

To configure the Lily HBase Indexer to use security, you must create principals and keytabs and then modify default configurations.

To create principals and keytabs

Repeat this process on all Lily HBase Indexer hosts.

  1. Create a Lily HBase Indexer service user principal using the syntax: hbase/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate with the Hadoop cluster. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.
    $ kadmin
    kadmin: addprinc -randkey hbase/fully.qualified.domain.name@YOUR-REALM.COM
  2. Create a HTTP service user principal using the syntax: HTTP/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate user requests coming to the Lily HBase Indexer web-services. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.
    kadmin: addprinc -randkey HTTP/fully.qualified.domain.name@YOUR-REALM.COM
  3. Create keytab files with both principals.
    kadmin: xst -norandkey -k hbase.keytab hbase/fully.qualified.domain.name \
    HTTP/fully.qualified.domain.name
  4. Test that credentials in the merged keytab file work. For example:
    $ klist -e -k -t hbase.keytab
  5. Copy the hbase.keytab file to the Lily HBase Indexer configuration directory. The owner of the hbase.keytab file should be the hbase user and the file should have owner-only read permissions.

To modify default configurations

Repeat this process on all Lily HBase Indexer hosts.

  1. Modify the hbase-indexer-site.xml file as follows:
      <property>
        <name>hbaseindexer.authentication.type</name>
        <value>kerberos</value>
      </property>
      <property>
        <name>hbaseindexer.authentication.kerberos.keytab</name>
        <value>hbase.keytab</value>
      </property>
      <property>
        <name>hbaseindexer.authentication.kerberos.principal</name>
        <value>HTTP/localhost@LOCALHOST</value>
      </property>
      <property>
        <name>hbaseindexer.authentication.kerberos.name.rules</name>
        <value>DEFAULT</value>
      </property>
  2. Set up the Java Authentication and Authorization Service (JAAS). Create a jaas.conf file in the HBase-Indexer configuration directory containing the following settings. Make sure that you substitute a value for principal that matches your particular environment.
    Client {
      com.sun.security.auth.module.Krb5LoginModule required
      useKeyTab=true
      useTicketCache=false
      keyTab="/etc/hbase/conf/hbase.keytab"
      principal="hbase/fully.qualified.domain.name@<YOUR-REALM>";
    };
    Then, modify hbase-indexer-env.sh in the hbase-indexer configuration directory to add the jaas configuration to the system properties. You can do this by adding -Djava.security.auth.login.config to the HBASE_INDEXER_OPTS. For example, you might add the following:
    HBASE_INDEXER_OPTS = "$HBASE_INDEXER_OPTS -Djava.security.auth.login.config=/path/to/your/jaas.conf"

Sentry integration

The Lily HBase Indexer uses a file-based access control model similar to that provided by Solr-Sentry integration, which is described in Enabling Sentry Authorization for Search using the Command Line. For details on configuring the HTTP API, which Sentry requires, see Configuring Clients to Use the HTTP Interface. The Lily HBase Indexer's file-based access control model supports specifying READ and WRITE privileges for each indexer. The privileges work as follows:

  • If role has WRITE privilege for indexer1, a call to create, update, or delete indexer1 succeeds.
  • If role has READ privilege for indexer1, a call to list-indexers will list indexer1, if it exists. If an indexer called indexer2 exists, but the role does not have READ privileges for it, information about indexer2 is filtered out of the response.
To configure Sentry for the Lily HBase Indexer, add the following properties to hbase-indexer-site.xml:
  <property>
    <name>sentry.hbaseindexer.sentry.site</name>
    <value>sentry-site.xml</value> (full or relative path)
  </property>
  <property>
    <name>hbaseindexer.rest.resource.package</name>
    <value>org/apache/sentry/binding/hbaseindexer/rest</value>
  </property>

Starting a Lily HBase NRT Indexer Service

You can use the Cloudera Manager GUI to start Lily HBase NRT Indexer Service on a set of machines. In non-managed deployments, you can start a Lily HBase Indexer Daemon manually on the local host with the following command:

sudo service hbase-solr-indexer restart

After starting the Lily HBase NRT Indexer Services, verify that all daemons are running using the jps tool from the Oracle JDK, which you can obtain from the Java SE Downloads page. If you are running a pseudo-distributed HDFS installation and a Lily HBase NRT Indexer Service installation on one machine, jps shows the following output:

$ sudo jps -lm
31407 sun.tools.jps.Jps -lm
26393 com.ngdata.hbaseindexer.Main