Configuring Sentry Authorization for Cloudera Search

Sentry enables role-based, fine-grained authorization for Cloudera Search. Sentry can apply a range of restrictions to various actions, such as accessing data, managing configurations through config objects, or creating collections. Restrictions are consistently applied, regardless of how users attempt to complete actions. For example, restricting access to data in a collection restricts that access whether queries come from the command line, a browser, Hue, or through the admin console. For additional information on Sentry, see Authorization With Apache Sentry.

  • This configuration process can be completed using either Cloudera Manager or the command-line instructions.
  • This information applies specifically to CDH 5.16.x. If you use an earlier version of CDH, see the documentation for that version located at Cloudera Documentation.

Setting Sentry Admins for Solr

If you are using the Sentry service (instead of a Sentry policy file), policies for Solr can be managed using the solrctl sentry command. To use this functionality, you must first designate a Sentry admin.

In Cloudera Manager:

  1. Navigate to the Sentry service configuration page (Sentry service > Configuration).
  2. In the Admin Groups field, add the name of a group to which you want to grant Sentry admin rights.
  3. In the Allowed Connecting Users field, add the users to which you want to grant Sentry admin rights. To connect to Sentry administratively, a user must be specified in Allowed Connecting Users, and also be a member of a group specified in Admin Groups.
  4. Click Save Changes.
  5. Click the stale services icon to restart the Sentry service and any dependent services.

If you are using the Sentry service without Cloudera Manager:

  1. Edit sentry-site.xml file as follows:
    1. Add the Sentry admin group to the comma-separated list of groups in the sentry.service.admin.group property.
    2. Add the Sentry admin users to the comma-separated list of users in the sentry.service.allow.connect property.
  2. Restart the Sentry service:
    bin/sentry --command service --conffile /path/to/sentry-site.xml

Using Roles and Privileges with Sentry

Sentry uses a role-based privilege model. A role is assigned a set of rules for accessing a given Solr collection or Solr config. Access to each collection is controlled by three privileges: Query, Update, and *. The wildcard (*) privilege indicates all privileges.

The admin collection is a special collection used to represent administrative actions. A non-administrative request may only require privileges on the collection or config on which the request is being performed. An administrative request generally requires privileges on both the admin collection and the collection on which the action is being performed. For more information on the privilege model for Search, including a mapping of actions to privilege requirements, see Authorization Privilege Model for Solr.

In contrast, access to config objects is controlled by a single privilege, *, meaning all privileges.

You can also use the wildcard (*) to specify all config objects or collections when granting privileges. The following example syntax applies to both native Sentry privileges and file-based privileges, though native Sentry privileges are set by using solrctl sentry commands as shown in Using Solr with the Sentry Service, and file-based privileges are set in policy files as shown in Using Solr with a Policy File.

For example:

  • A rule for the Query privilege on collection named logs is formulated as follows:
    collection=logs->action=Query
  • A rule for the * privilege, meaning all privileges, on the config named myConfig is formulated as follows:
    config=myConfig->action=*

    No action implies *. Because config objects only support the * action, the following config privilege is invalid:

    config=myConfig->action=Update
  • A rule granting all collections the Query privilege is formulated as follows:
    collection=*->action=Query
config objects cannot be combined with collection objects in a single privilege. For example, the following combinations are invalid:
  • config=myConfig->collection=myCollection->action=*
  • collection=myCollection->config=myConfig
You must specify these privileges separately. For example:
myRole = collection=myCollection->action=QUERY, config=myConfig->action=*
A role can contain multiple such rules, separated by commas. For example the engineer_role might contain the Query privilege for hive_logs and hbase_logs collections, and the Update privilege for the current_bugs collection. This example is formulated as follows:
engineer_role = collection=hive_logs->action=Query, collection=hbase_logs->action=Query, collection=current_bugs->action=Update

Using Users and Groups with Sentry

  • A user is an entity that is permitted by the Kerberos authentication system to access the Search service.
  • A group connects the authentication system with the authorization system. It is a set of one or more users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured for a group.
  • A configured group provider specifies how group membership is determined. Sentry supports HDFS-backed groups and locally configured groups. For example,
    dev_ops = dev_role, ops_role

Here the group dev_ops is granted the roles dev_role and ops_role. The members of this group can perform the actions that are allowed by these roles.

User to Group Mapping

You can configure Sentry to use either Hadoop groups or groups defined in the policy file.

To configure Hadoop groups:

Set the sentry.provider property in sentry-site.xml to org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.

By default, this uses local shell groups. See the Group Mapping section of the HDFS Permissions Guide for more information.

In this case, Sentry uses the Hadoop configuration described in Configuring LDAP Group Mappings. Cloudera Manager automatically uses this configuration. In a deployment not managed by Cloudera Manager, manually set these configuration parameters in the hadoop-conf file that is passed to Solr.

OR

To configure local groups:

  1. Define local groups in a [users] section of the Sentry Policy file. For example:
    [users]
    user1 = group1, group2, group3
    user2 = group2, group3
  2. In sentry-site.xml, set search.sentry.provider as follows:
    <property>
        <name>sentry.provider</name>
        <value>org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider</value>
      </property>

Enabling Caching for the Sentry Service

Using the Sentry Service with Cloudera Search can introduce latency because authorization requests must be sent to the Sentry Service. To alleviate this latency, enable caching by adding the following property to sentry-site.xml on each Solr Server:

<property>
  <name>sentry.provider.backend.generic.cache.enabled</name>
  <value>true</value>
</property>

By default, this caches Sentry responses for 30 seconds. To modify the cache duration, add the following property to sentry-site.xml on each Solr Server:

<property>
  <name>sentry.provider.backend.generic.cache.ttl.ms</name>
  <value>30000</value>
</property>

The value is set in milliseconds.

For Cloudera Manager environments, add these properties to the Advanced Configuration Snippet for sentry-site.xml:

  1. Go to Solr service > Configuration > Advanced > Solr Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml.
  2. Click the Add button.
  3. Enter the following values:
    • Name: sentry.provider.backend.generic.cache.enabled
    • Value: true
  4. Click the Add button.
  5. Enter the following values:
    • Name: sentry.provider.backend.generic.cache.ttl.ms
    • Value: 30000
  6. Click Save Changes.
  7. Restart the Solr service (Solr service > Actions > Restart).

Sample Sentry Configuration

This section provides sample configurations.

Using Solr with the Sentry Service

In CDH 5.8, Cloudera Search adds support for storing permissions in the Sentry service. You can enable storing permissions in the Sentry service by Enabling the Sentry Service for Solr. If you have already configured Sentry's policy file-based approach, you can migrate existing authorization settings as described in Migrating from Sentry Policy Files to the Sentry Service. solrctl has been extended to support:
  • Migrating existing policy files to the Sentry service
  • Managing managing permissions in the Sentry service

The following is an example of the commands used to configure Sentry for Solr using solrctl sentry command. These commands should be run on a host with a Solr Gateway role.

These sample commands that follow illustrate establishing two different roles, each of which have different access requirements. The process of creating roles, adding roles to groups, and granting privileges to roles is a typical workflow used to provide different groups varied degrees of access to resources. For reference information, see solrctl Reference.

Begin by creating roles. The following command creates ops_role and dev_ops_role:
solrctl sentry --create-role ops_role
solrctl sentry --create-role dev_ops_role
Next, add existing Hadoop groups to the roles you created. The following command adds ops_role to the existing ops_group Hadoop group and adds dev_ops_role to the existing dev_ops_group Hadoop group:
solrctl sentry --add-role-group ops_role ops_group
solrctl sentry --add-role-group dev_ops_role dev_ops_group
Finally, add privileges to collections and configs to roles. The following command adds the QUERY privilege to ops_role for the logs collection and all privileges (meaning QUERY and UPDATE) to the dev_ops_role for all (*) collections:
solrctl sentry --grant-privilege ops_role 'collection=logs->action=Query'
solrctl sentry --grant-privilege dev_ops_role 'collection=*->action=*'

Using Solr with a Policy File

Use separate policy files for each Sentry-enabled service. Using one file for multiple services results in each service failing on the other services' entries. For example, with a combined Hive and Search file, Search would fail on Hive entries and Hive would fail on Search entries.

Sentry with Search does not support multiple policy files. Other implementations of Sentry such as Sentry for Hive do support different policy files for different databases, but Sentry for Search has no such support for multiple policies.

The following is an example of a Search policy file. The This location must be readable by Solr.

sentry-provider.ini

[groups]
# Assigns each Hadoop group to its set of roles
engineer = engineer_role
ops = ops_role
dev_ops = engineer_role, ops_role
hbase_admin = hbase_admin_role

[roles]
# The following grants all access to source_code.
# "collection = source_code" can also be used as syntactic
# sugar for "collection = source_code->action=*"
engineer_role = collection = source_code->action=*

# The following imply more restricted access.
ops_role = collection = hive_logs->action=Query
dev_ops_role = collection = hbase_logs->action=Query

#give hbase_admin_role the ability to create/delete/modify the hbase_logs collection
#as well as to update the config for the hbase_logs collection, called hbase_logs_config.
hbase_admin_role = collection=admin->action=*, collection=hbase_logs->action=*, config=hbase_logs_config->action=*

Sentry Configuration File

Sentry can store configuration as well as privilege policies in files. The sentry-site.xml file contains configuration options such as privilege policy file location. The policy files contains the privileges and groups. It has a .ini file format and should be stored on HDFS.

The following is an example of a sentry-site.xml file.

sentry-site.xml

<configuration>
  <property>
    <name>hive.sentry.provider</name>
    <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
  </property>

  <property>
    <name>sentry.solr.provider.resource</name>
    <value>/path/to/authz-provider.ini</value>
    <!--
        If the HDFS configuration files (core-site.xml, hdfs-site.xml)
        pointed to by SOLR_HDFS_CONFIG in /etc/default/solr
        point to HDFS, the path will be in HDFS;
        alternatively you could specify a full path,
        e.g.:hdfs://namenode:port/path/to/authz-provider.ini
    -->
  </property>

Using Policy Files with Sentry

This section contains notes on creating and maintaining the policy file.

Storing the Policy File

Considerations for storing the policy file(s) include:

  1. Replication count - Because Sentry reads the file for each query, you should increase this. 10 is a reasonable value.
  2. Updating the file - Updates to the file are only reflected when the Solr process is restarted.

Defining Roles

Keep in mind that role definitions are not cumulative. The newer definition replaces the older one. For example, consider the following definition:

role1 = privilege1
role1 = privilege2

This definition results in role1 having privilege2, not privilege1 and privilege2.

Providing Document-Level Security Using Sentry

For role-based access control of a collection, an administrator modifies a Sentry role so it has query, update, or administrative access.

Collection-level authorization is useful when the access control requirements for the documents in the collection are the same, but users may want to restrict access to a subset of documents in a collection. This finer-grained restriction can be achieved by defining separate collections for each subset, but this is difficult to manage, requires duplicate documents for each collection, and requires that these documents be kept synchronized.

Document-level access control solves this issue by associating authorization tokens with each document in the collection. This enables granting Sentry roles access to sets of documents in a collection.

Document-Level Security Model

Document-level security depends on a chain of relationships between users, groups, roles, and documents.

  • Users are assigned to groups.
  • Groups are assigned to roles.
  • Roles are stored as "authorization tokens" in a specified field in the documents.

Document-level security supports restricting which documents can be viewed by which users. Access is provided by adding roles as "authorization tokens" to a specified document field. Conversely, access is implicitly denied by omitting roles from the specified field. In other words, in a document-level security enabled environment, a user might submit a query that matches a document; if the user is not part of a group that has a role has been granted access to the document, the result is not returned.

For example, Alice might belong to the administrators group. The administrators group may belong to the doc-mgmt role. A document could be ingested and the doc-mgmt role could be added at ingest time. In such a case, if Alice submitted a query that matched the document, Search would return the document, since Alice is then allowed to see any document with the "doc-mgmt" authorization token.

Similarly, Bob might belong to the guests group. The guests group may belong to the public-browser role. If Bob tried the same query as Alice, but the document did not have the public-browser role, Search would not return the result because Bob does not belong to a group that is associated with a role that has access.

Note that collection-level authorization rules still apply, if enabled. Even if Alice is able to view a document given document-level authorization rules, if she is not allowed to query the collection, the query will fail.

Roles are typically added to documents when those documents are ingested, either using the standard Solr APIs or, if using morphlines, the setValues morphline command.

Enabling Document-Level Security

Cloudera Search supports document-level security in Search for CDH 5.1 and higher. Document-level security requires collection-level security. Configuring collection-level security is described earlier in this topic.

Document-level security is disabled by default, so the first step in using document-level security is to enable the feature by modifying the solrconfig.xml.secure file. Remember to replace the solrconfig.xml with this file, as described in Enabling Solr as a Client for the Sentry Service Using the Command Line.

To enable document-level security, change solrconfig.xml.secure. The default file contents are as follows:

<searchComponent name="queryDocAuthorization">
    <!-- Set to true to enabled document-level authorization -->

    <bool name="enabled">false</bool>


    <!-- Field where the auth tokens are stored in the document -->
    <str name="sentryAuthField">sentry_auth</str>


    <!-- Auth token defined to allow any role to access the document.
         Uncomment to enable. -->

    <!--<str name="allRolesToken">*</str>-->

</searchComponent>
  • The enabled Boolean determines whether document-level authorization is enabled. To enable document level security, change this setting to true.
  • The sentryAuthField string specifies the name of the field that is used for storing authorization information. You can use the default setting of sentry_auth or you can specify some other string to be used for assigning values during ingest.
  • The allRolesToken string represents a special token defined to allow any role access to the document. By default, this feature is disabled. To enable this feature, uncomment the specification and specify the token. This token should be different from the name of any sentry role to avoid collision. By default it is "*". This feature is useful when first configuring document level security or it can be useful in granting all roles access to a document when the set of roles may change. See Best Practices for additional information.

Best Practices

Using allRolesToken

You may want to grant every user that belongs to a role access to certain documents. One way to accomplish this is to specify all known roles in the document, but this requires updating or re-indexing the document if you add a new role. Alternatively, an allUser role, specified in the Sentry .ini file, could contain all valid groups, but this role would need to be updated every time a new group was added to the system. Instead, specifying allRolesToken allows any user that belongs to a valid role to access the document. This access requires no updating as the system evolves.

In addition, allRolesToken may be useful for transitioning a deployment to use document-level security. Instead of having to define all the roles upfront, all the documents can be specified with allRolesToken and later modified as the roles are defined.

Consequences of Document-Level Authorization Only Affecting Queries

Document-level security does not prevent users from modifying documents or performing other update operations on the collection. Update operations are only governed by collection-level authorization.

Document-level security can be used to prevent documents being returned in query results. If users are not granted access to a document, those documents are not returned even if that user submits a query that matches those documents. This does not have affect attempted updates.

Consequently, it is possible for a user to not have access to a set of documents based on document-level security, but to still be able to modify the documents using their collection-level authorization update rights. This means that a user can delete all documents in the collection. Similarly, a user might modify all documents, adding their authorization token to each one. After such a modification, the user could access any document using querying. Therefore, if you are restricting access using document-level security, consider granting collection-level update rights only to those users you trust and assume they will be able to access every document in the collection.

Limitations on Query Size

By default queries support up to 1024 Boolean clauses. As a result, queries containing more that 1024 clauses may cause errors. Because authorization information is added by Sentry as part of a query, using document-level security can increase the number of clauses. In the case where users belong to many roles, even simple queries can become quite large. If a query is too large, an error of the following form occurs:

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
To change the supported number of clauses, edit the maxBooleanClauses setting in solrconfig.xml. For example, to allow 2048 clauses, you would edit the setting so it appears as follows:
<maxBooleanClauses>2048</maxBooleanClauses>

For maxBooleanClauses to be applied as expected, make any change to this value to all collections and then restart the service. You must make this change to all collections because this option modifies a global Lucene property, affecting all Solr cores. If different solrconfig.xml files have different values for this property, the effective value is determined per host, based on the first Solr core to be initialized.

Enabling Secure Impersonation

Secure impersonation allows a user to make requests as another user in a secure way. The user who has been granted impersonation rights receives the same access as the user being impersonated.

Configure custom security impersonation settings using the Solr Service Environment Advanced Configuration Snippet (Safety Valve). For example, to allow the following impersonations:

  • User hue can make requests as any user from any host.
  • User foo can make requests as any member of group bar, from host1 or host2.
    Enter the following values into the Solr Service Environment Advanced Configuration Snippet (Safety Valve):
    SOLR_SECURITY_ALLOWED_PROXYUSERS=hue,foo
    SOLR_SECURITY_PROXYUSER_hue_HOSTS=*
    SOLR_SECURITY_PROXYUSER_hue_GROUPS=*
    SOLR_SECURITY_PROXYUSER_foo_HOSTS=host1,host2
    SOLR_SECURITY_PROXYUSER_foo_GROUPS=bar
SOLR_SECURITY_ALLOWED_PROXYUSERS lists all of the users allowed to impersonate. For a user x in SOLR_SECURITY_ALLOWED_PROXYUSERS, SOLR_SECURITY_PROXYUSER_x_HOSTS list the hosts x is allowed to connect from to impersonate, and SOLR_SECURITY_PROXYUSERS_x_GROUPS lists the groups that the users is allowed to impersonate members of. Both GROUPS and HOSTS support the wildcard * and both GROUPS and HOSTS must be defined for a specific user.