This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Search Security Configuration

This section describes how to configure Search in CDH 5 to enable Kerberos security and Sentry.

Configuring Search to Use Kerberos

Cloudera Search supports Kerberos authentication. All necessary packages are installed when you install Search. To enable Kerberos, create principals and keytabs and then modify default configurations.

The following instructions only apply to configuring Kerberos in an unmanaged environment. Kerberos configuration is automatically handled by Cloudera Manager if you are using Search in a Cloudera managed environment.

To create principals and keytabs

Repeat this process on all Solr server nodes.

  1. Create a Solr service user principal using the syntax: solr/<>@<YOUR-REALM>. This principal is used to authenticate with the Hadoop cluster. where: is the host where the Solr server is running YOUR-REALM is the name of your Kerberos realm.
    $ kadmin
    kadmin: addprinc -randkey solr/
  2. Create a HTTP service user principal using the syntax: HTTP/<>@<YOUR-REALM>. This principal is used to authenticate user requests coming to the Solr web-services. where: is the host where the Solr server is running YOUR-REALM is the name of your Kerberos realm.
    kadmin: addprinc -randkey HTTP/

    The HTTP/ component of the HTTP service user principal must be upper case as shown in the syntax and example above.

  3. Create keytab files with both principals.
    kadmin: xst -norandkey -k solr.keytab solr/ \
  4. Test that credentials in the merged keytab file work. For example:
    $ klist -e -k -t solr.keytab
  5. Copy the solr.keytab file to the Solr configuration directory. The owner of the solr.keytab file should be the solr user and the file should have owner-only read permissions.

To modify default configurations

Repeat this process on all Solr server nodes.

  1. Ensure that the following properties appear in /etc/default/solr and that they are uncommented. Modify these properties to match your environment. The relevant properties to be uncommented and modified are:
      Note: Modify the values for these properties to match your environment. For example, the SOLR_AUTHENTICATION_KERBEROS_PRINCIPAL=HTTP/localhost@LOCALHOST must include the principal instance and Kerberos realm for your environment. That is often different from localhost@LOCALHOST.
  2. If using applications that use the solrj library, set up the Java Authentication and Authorization Service (JAAS).
    1. Create a jaas.conf file in the Solr configuration directory containing the following settings. This file and its location must match the SOLR_AUTHENTICATION_JAAS_CONF value. Make sure that you substitute a value for principal that matches your particular environment.
      Client { required
  3. To use short principal names:
    • Appendix C - Configuring the Mapping from Kerberos Principals to Short Names in the CDH 5 Security Guide.

Using Kerberos

The process of enabling Solr clients to authenticate with a secure Solr is specific to the client. This section demonstrates:
  • Using Kerberos and curl
  • Using solrctl
  • Configuring SolrJ Library Usage
    This enables technologies including:
    • Command-line solutions
    • Java applications
    • The MapReduceIndexerTool
  • Configuring Flume Morphline Solr Sink Usage

Secure Solr requires that the CDH components that it interacts with are also secure. Secure Solr interacts with HDFS, ZooKeeper and optionally HBase, MapReduce, and Flume. See the CDH 5 Security Guide or the CDH 4 Security Guide for more information.

Using Kerberos and curl

You can use Kerberos authentication with clients such as curl. To use curl, begin by acquiring valid Kerberos credentials and then execute the desired command. For example, you might use commands similar to the following:

$ kinit -kt username.keytab username
$ curl --negotiate -u: foo:bar http://solrserver:8983/solr/
  Note: Depending on the tool used to connect, additional arguments may be required. For example, with curl, --negotiate and -u are required. The username and password specified with -u is not actually checked because Kerberos is used. As a result, any value such as foo:bar or even just : is acceptable. While any value can be provided for -u, note that the option is required. Omitting -u results in a 401 Unauthorized error, even though the -u value is not actually used.

Using solrctl

If you are using solrctl to manage your deployment in an environment that requires Kerberos authentication, you must have valid Kerberos credentials, which you can get using kinit. For more information on solrctl, see Solrctl Reference

Configuring SolrJ Library Usage

If using applications that use the solrj library, begin by establishing a Java Authentication and Authorization Service (JAAS) configuration file.

Create a JAAS file:

  • If you have already used kinit to get credentials, you can have the client use those credentials. In such a case, modify your jaas-client.conf file to appear as follows:
    Client { required
    where user/<YOUR-REALM> is replaced with your credentials.
  • You want the client application to authenticate using a keytab you specify:
    Client { required
    where /path/to/keytab/user.keytab is the keytab file you wish to use and user/<YOUR-REALM> is the principal in that keytab you wish to use.

Use the JAAS file to enable solutions:

  • Command line solutions
    Set the property when invoking the program. For example, if you were using a a jar, you might use:
    java -jar app.jar
  • Java applications
    Set the Java system property For example, if the JAAS configuration file is located on the filesystem as /home/user/jaas-client.conf. The Java system property must be set to point to this file. Setting a Java system property can be done programmatically, for example using a call such as:
    System.setProperty("", "/home/user/jaas-client.conf");
  • The MapReduceIndexerTool
    The MapReduceIndexerTool uses SolrJ to pass the JAAS configuration file. Using the MapReduceIndexerTool in a secure environment requires the use of the HADOOP_OPTS variable to specify the JAAS configuration file. For example, you might issue a command such as the following:
    HADOOP_OPTS="" \
    hadoop jar MapReduceIndexerTool
  • Configuring the hbase-indexer CLI

    Certain hbase-indexer CLI commands such as replication-status attempt to read ZooKeeper nodes owned by HBase. To successfully use these commands in Search for CDH 5 in a secure environment, specify a JAAS configuration file with the HBase principal in the HBASE_INDEXER_OPTS environment variable. For example, you might issue a command such as the following:

    hbase-indexer replication-status

Configuring Flume Morphline Solr Sink Usage

Repeat this process on all Flume nodes:

  1. If you have not created a keytab file, do so now at /etc/flume-ng/conf/flume.keytab. This file should contain the service principal flume/<>@<YOUR-REALM>. See the CDH 5 Security Guide for more information.
  2. Create a JAAS configuration file for flume at /etc/flume-ng/conf/jaas-client.conf. The file should appear as follows:
    Client { required
  3. Add the flume JAAS configuration to the JAVA_OPTS in /etc/flume-ng/conf/ For example, you might change:

Configuring Sentry for Search

Sentry enables role-based, fine-grained authorization for Cloudera Search. Follow the instructions below to configure Sentry under CDH 4.5 or later or CDH 5. Sentry is included in the Search installation.
  Note: Sentry for Search depends on Kerberos authentication. For additional information on using Kerberos with Search, see The Configuring Search to Use Kerberos and Using Kerberos sections earlier in this page.

Note that this document is for configuring Sentry for Cloudera Search. For information about alternate ways to configure Sentry or for information about installing Sentry for other services, see:

Roles and Collection-Level Privileges

Sentry uses a role-based privilege model. A role is a set of rules for accessing a given Solr collection. Access to each collection is governed by privileges: Query, Update, or All (*).

For example, a rule for the Query privilege on collection logs would be formulated as follows:
A role can contain multiple such rules, separated by commas. For example the engineer_role might contain the Query privilege for hive_logs and hbase_logs collections, and the Update privilege for the current_bugs collection. You would specify this as follows:
engineer_role = collection=hive_logs->action=Query, collection=hbase_logs->action=Query, collection=current_bugs->action=Update

Users and Groups

  • A user is an entity that is permitted by the Kerberos authentication system to access the Search service.
  • A group connects the authentication system with the authorization system. It is a set of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configured group provider determines a user’s affiliation with a group. The current release supports HDFS-backed groups and locally configured groups. For example,
    dev_ops = dev_role, ops_role

Here the group dev_ops is granted the roles dev_role and ops_role. The members of this group can complete searches that are allowed by these roles.

User to Group Mapping

You can configure Sentry to use either Hadoop groups or groups defined in the policy file.

  Important: You can use either Hadoop groups or local groups, but not both at the same time. Use local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups.

To configure Hadoop groups:

Set the sentry.provider property in sentry-site.xml to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
  Note: Note that, by default, this uses local shell groups. See the Group Mapping section of the HDFS Permissions Guide for more information.


To configure local groups:

  1. Define local groups in a [users] section of the Sentry Configuration File, sentry-site.xml. For example:
    user1 = group1, group2, group3
    user2 = group2, group3
  2. In sentry-site.xml, set search.sentry.provider as follows:

Setup and Configuration

This release of Sentry stores the configuration as well as privilege policies in files. The sentry-site.xml file contains configuration options such as privilege policy file location. The Policy File contains the privileges and groups. It has a .ini file format and should be stored on HDFS.

Sentry is automatically installed when you install Cloudera Search for CDH or Cloudera Search 1.1.0 or later.

Policy File

The sections that follow contain notes on creating and maintaining the policy file.

  Warning: An invalid configuration disables all authorization while logging an exception.

Storing the Policy File

Considerations for storing the policy file(s) include:

  1. Replication count - Because the file is read for each query, you should increase this; 10 is a reasonable value.
  2. Updating the file - Updates to the file are only reflected when the Solr process is restarted.

Defining Roles

Keep in mind that role definitions are not cumulative; the newer definition replaces the older one. For example, the following results in role1 having privilege2, not privilege1 and privilege2.
role1 = privilege1
role1 = privilege2

Sample Configuration

This section provides a sample configuration.

  Note: Sentry with CDH Search does not support multiple policy files. Other implementations of Sentry such as Sentry for Hive do support different policy files for different databases, but Sentry for CDH Search has no such support for multiple policies.

Policy File

The following is an example of a CDH Search policy file. The sentry-provider.ini would exist in an HDFS location such as hdfs://ha-nn-uri/user/solr/sentry/sentry-provider.ini.
  Note: Use separate policy files for each Sentry-enabled service. Using one file for multiple services results in each service failing on the other services' entries. For example, with a combined Hive and Search file, Search would fail on Hive entries and Hive would fail on Search entries.
# Assigns each Hadoop group to its set of roles
engineer = engineer_role
ops = ops_role
dev_ops = engineer_role, ops_role
hbase_admin = hbase_admin_role

# The following grants all access to source_code.
# "collection = source_code" can also be used as syntactic
# sugar for "collection = source_code->action=*"
engineer_role = collection = source_code->action=*

# The following imply more restricted access.
ops_role = collection = hive_logs->action=Query
dev_ops_role = collection = hbase_logs->action=Query

#give hbase_admin_role the ability to create/delete/modify the hbase_logs collection
hbase_admin_role = collection=admin->action=*, collection=hbase_logs->action=*

Sentry Configuration File

The following is an example of a sentry-site.xml file.



        If the HDFS configuration files (core-site.xml, hdfs-site.xml)
        pointed to by SOLR_HDFS_CONFIG in /etc/default/solr
        point to HDFS, the path will be in HDFS;
        alternatively you could specify a full path,

Enabling Sentry in Cloudera Search for CDH 5

Enabling Sentry is achieved by adding two properties to /etc/default/solr. If your Search installation is managed by Cloudera Manager, then these properties are added automatically. If your Search installation is not managed by Cloudera Manager, you must make these changes yourself. The variable SOLR_AUTHORIZATION_SENTRY_SITE specifies the path to sentry-site.xml. The variable SOLR_AUTHORIZATION_SUPERUSER specifies the first part of SOLR_KERBEROS_PRINCIPAL. This is solr for the majority of users, as solr is the default. Settings are of the form:

To enable sentry collection-level authorization checking on a new collection, the instancedir for the collection must use a modified version of solrconfig.xml with Sentry integration. The command solrctl instancedir --generate generates two versions of solrconfig.xml: the standard solrconfig.xml without sentry integration, and the sentry-integrated version called To use the sentry-integrated version, replace solrconfig.xml with before creating the instancedir.

If you have an existing collection using the standard solrconfig.xml called "foo" and an instancedir of the same name, perform the following steps:

# generate a fresh instancedir
solrctl instancedir --generate foosecure
# download the existing instancedir from ZK into subdirectory "foo"
solrctl instancedir --get foo foo
# replace the existing solrconfig.xml with the sentry-enabled one
cp foosecure/conf/ foo/conf/solrconfig.xml
# update the instancedir in ZK
solrctl instancedir --update foo foo
# reload the collection
solrctl collection --reload foo

If you have an existing collection using a version of solrconfig.xml that you have modified, contact Support for assistance.

Providing Document-Level Security Using Sentry

For role-based access control of a collection, an administrator modifies a Sentry role so it has query, update, or administrative access, as described above.

Collection-level authorization is useful when the access control requirements for the documents in the collection are the same, but users may want to restrict access to a subset of documents in a collection. This finer-grained restriction could be achieved by defining separate collections for each subset, but this is difficult to manage, requires duplicate documents for each collection, and requires that these documents be kept in sync.

Document-level access control solves this issue by associating authorization tokens with each document in the collection. This enables granting Sentry roles access to sets of documents in a collection.

Document-Level Security Model

Document-level security depends on a chain of relationships between users, groups, roles, and documents.

  • Users are assigned to groups.
  • Groups are assigned to roles.
  • Roles are stored as "authorization tokens" in a specified field in the documents.

Document-level security supports restricting which documents can be viewed by which users. Access is provided by adding roles as "authorization tokens" to a specified document field. Conversely, access is implicitly denied by omitting roles from the specified field. In other words, in a document-level security enabled environment, a user might submit a query that matches a document; if the user is not part of a group that has a role has been granted access to the document, the result is not returned.

For example, Alice might belong to the administrators group. The administrators group may belong to the doc-mgmt role. A document could be ingested and the doc-mgmt role could be added at ingest time. In such a case, if Alice submitted a query that matched the document, Search would return the document, since Alice is then allowed to see any document with the "doc-mgmt" authorization token.

Similarly, Bob might belong to the guests group. The guests group may belong to the public-browser role. If Bob tried the same query as Alice, but the document did not have the public-browser role, Search would not return the result because Bob does not belong to a group that is associated with a role that has access.

Note that collection-level authorization rules still apply, if enabled. Even if Alice is able to view a document given document-level authorization rules, if she is not allowed to query the collection, the query will fail.

Roles are typically added to documents when those documents are ingested, either via the standard Solr APIs or, if using morphlines, the setValues morphline command.

Enabling Document-Level Security

Cloudera Search supports document-level security in Search for CDH 5.1 and later. Document-level security is disabled by default, so the first step in using document-level security is to enable the feature by modifying the file. Remember to replace the solrconfig.xml with this file, as described in Enabling Sentry for Cloudera Search in CDH 5 above.

To enable document-level security, change The default file contents are as follows:

<searchComponent name="queryDocAuthorization">
    <!-- Set to true to enabled document-level authorization -->

    <bool name="enabled">false</bool>

    <!-- Field where the auth tokens are stored in the document -->
    <str name="sentryAuthField">sentry_auth</str>

    <!-- Auth token defined to allow any role to access the document.
         Uncomment to enable. -->

    <!--<str name="allRolesToken">*</str>-->

  • The enabled Boolean determines whether document-level authorization is enabled. To enable document level security, change this setting to true.
  • The sentryAuthField string specifies the name of the field that is used for storing authorization information. You can use the default setting of sentry_auth or you can specify some other string that you will use for assigning values on ingest.
      Note: This field must exist as an explicit or dynamic field in the schema. sentry_auth exists in the default schema.xml.
  • The allRolesToken string represents a special token defined to allow any role access to the document. By default, this feature is disabled. To enable this feature, uncomment the specification and specify the token. This token should be different from the name of any sentry role to avoid collision. By default it is "*". This feature is useful when first configuring document level security or it can be useful in granting all roles access to a document when the set of roles may change. See the following Best Practices section for additional information.

Best Practices

Using the allGroupsToken

You may want to grant every user that belongs to a role access to certain documents. One way to accomplish this is to specify all known roles in the document, but this requires updating or reindexing the document if you add a new role. Alternatively, an "allUser" role, specified in the Sentry .ini file, could contain all valid groups, but this role would need to be updated every time a new group was added to the system. Instead, specifying the allGroupsToken allows any user that belongs to a valid role to access the document. This access requires no updating as the system evolves.

In addition, the allGroupsToken may be useful for transitioning a deployment to use document-level security. Instead of having to define all the roles upfront, all the documents can be specified with the allGroupsToken and later modified as the roles are defined.

Consequences of Document-Level Authorization Only Affecting Queries

Document-level security does not prevent users from modifying documents or performing other update operations on the collection. Update operations are only governed by collection-level authorization.

Document-level security can be used to prevent documents being returned in query results. If users are not granted access to a document, those documents are not returned even if that user submits a query that matches those documents. This does not have affect attempted updates.

Consequently, it is possible for a user to not have access to a set of documents based on document-level security, but to still be able to modify the documents via their collection-level authorization update rights. This means that a user can delete all documents in the collection. Similarly, a user might modify all documents, adding their authorization token to each one. After such a modification, the user could access any document via querying. Therefore, if you are restricting access using document-level security, consider granting collection-level update rights only to those users you trust and assume they will be able to access every document in the collection.

Limitations on Query Size

By default queries support up to 1024 Boolean clauses. As a result, queries containing more that 1024 clauses may cause errors. Because authorization information is added by Sentry as part of a query, using document-level security can increase the number of clauses. In the case where users belong to many roles, even simple queries can become quite large. If a query is too large, an error of the following form occurs:$TooManyClauses: maxClauseCount is set to 1024
To change the supported number of clauses, edit the maxBooleanClauses setting in solrconfig.xml. For example, to allow 2048 clauses, you would edit the setting so it appears as follows:

For maxBooleanClauses to be applied as expected, make any change to this value to all collections and then restart the service. You must make this change to all collections because this option modifies a global Lucene property, affecting all SolrCores. If different solrconfig.xml files have different values for this property, the effective value is determined per node, based on the first SolrCore to be initialized.

Enabling Secure Impersonation

Secure Impersonation is a feature that allows a user to make requests as another user in a secure way. For example, to allow the following impersonations:

  • User "hue" can make requests as any user from any host.
  • User "foo" can make requests as any member of group "bar", from "host1" or "host2".
    Configure the following properties in /etc/default/solr:
SOLR_SECURITY_ALLOWED_PROXYUSERS lists all of the users allowed to impersonate. For a user x in SOLR_SECURITY_ALLOWED_PROXYUSERS, SOLR_SECURITY_PROXYUSER_x_HOSTS list the hosts x is allowed to connect from in order to impersonate, and SOLR_SECURITY_PROXYUSERS_x_GROUPS lists the groups that the users is allowed to impersonate members of. Both GROUPS and HOSTS support the wildcard * and both GROUPS and HOSTS must be defined for a specific user.
  Note: Cloudera Manager has its own management of secure impersonation for Hue. To add additional users for Secure Impersonation, use the environment variable safety value for Solr to set the environment variables as above. Be sure to include "hue" in SOLR_SECURITY_ALLOWED_PROXYUSERS if you want to use secure impersonation for hue.

Debugging Failed Sentry Authorization Requests

Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
  • In Cloudera Manager, add to the logging settings for your service through the corresponding Logging Safety Valve field for the Impala, Hive Server 2, or Solr Server services.
  • On systems not managed by Cloudera Manager, add to the file on each host in the cluster, in the appropriate configuration directory for each service.
Specifically, look for exceptions and messages such as:
FilePermission server..., RequestPermission server...., result [true|false]
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating "Access Denied".

Appendix: Authorization Privilege Model for Search

The tables below refer to the request handlers defined in the generated If you are not using this configuration file, the below may not apply.

admin is a special collection in sentry used to represent administrative actions. A non-administrative request may only require privileges on the collection on which the request is being performed. This is called collection1 in this appendix. An administrative request may require privileges on both the admin collection and collection1. This is denoted as admin, collection1 in the tables below.

Table 1. Privilege table for non-administrative request handlers
Request Handler Required Privilege Collections that Require Privilege
select QUERY collection1
query QUERY collection1
get QUERY collection1
browse QUERY collection1
tvrh QUERY collection1
clustering QUERY collection1
terms QUERY collection1
elevate QUERY collection1
analysis/field QUERY collection1
analysis/document QUERY collection1
update UPDATE collection1
update/json UPDATE collection1
update/csv UPDATE collection1
Table 2. Privilege table for collections admin actions
Collection Action Required Privilege Collections that Require Privilege
create UPDATE admin, collection1
delete UPDATE admin, collection1
reload UPDATE admin, collection1
createAlias UPDATE admin, collection1
  Note: "collection1" here refers to the name of the alias, not the underlying collection(s). For example, http://YOUR-HOST:8983/ solr/admin/collections?action= CREATEALIAS&name=collection1 &collections=underlyingCollection
deleteAlias UPDATE admin, collection1
  Note: "collection1" here refers to the name of the alias, not the underlying collection(s). For example, http://YOUR-HOST:8983/ solr/admin/collections?action= DELETEALIAS&name=collection1
syncShard UPDATE admin, collection1
splitShard UPDATE admin, collection1
deleteShard UPDATE admin, collection1
Table 3. Privilege table for core admin actions
Collection Action Required Privilege Collections that Require Privilege
create UPDATE admin, collection1
rename UPDATE admin, collection1
load UPDATE admin, collection1
unload UPDATE admin, collection1
status UPDATE admin, collection1
persist UPDATE admin
reload UPDATE admin, collection1
swap UPDATE admin, collection1
mergeIndexes UPDATE admin, collection1
split UPDATE admin, collection1
prepRecover UPDATE admin, collection1
requestRecover UPDATE admin, collection1
requestSyncShard UPDATE admin, collection1
requestApplyUpdates UPDATE admin, collection1
Table 4. Privilege table for Info and AdminHandlers
Request Handler Required Privilege Collections that Require Privilege
LukeRequestHandler QUERY admin
SystemInfoHandler QUERY admin
SolrInfoMBeanHandler QUERY admin
PluginInfoHandler QUERY admin
ThreadDumpHandler QUERY admin
PropertiesRequestHandler QUERY admin
LogginHandler QUERY, UPDATE (or *) admin
ShowFileRequestHandler QUERY admin
Page generated September 3, 2015.