Synchronizing HDFS ACLs and Sentry Permissions
This topic introduces an HDFS-Sentry plugin that allows you to configure synchronization of Sentry privileges with HDFS ACLs for specific HDFS directories.
- You could set ownership for the entire Hive warehouse to hive:hive and not allow other components any access to the data. While this is secure, it does not allow for sharing.
- Use HDFS ACLs and synchronize Sentry privileges and HDFS ACLs manually. For example, if a user only has the Sentry SELECT privilege on a table, that user should only be able to read the table data files, and not write to those HDFS files.
Introduction
To solve the problem stated above, CDH 5.3 introduces integration of Sentry and HDFS permissions that will automatically keep HDFS ACLs in sync with the privileges configured with Sentry. This feature offers the easiest way to share data between Hive, Impala and other components such as MapReduce, Spark, Pig, and so on, while setting permissions for that data with just one set of rules through Sentry. It maintains the ability of Hive and Impala to set permissions on views, in addition to tables, while access to data outside of Hive and Impala (for example, reading files off HDFS) requires table permissions. HDFS permissions for some or all of the files that are part of tables defined in the Hive Metastore will now be controlled by Sentry.
- An HDFS NameNode plugin
- A Sentry-Hive Metastore plugin
- A Sentry Service plugin
With synchronization enabled, Sentry will translate permissions on tables to the appropriate corresponding HDFS ACL on the underlying table files in HDFS. For example, if a user group is assigned to a Sentry role that has SELECT permission on a particular table, then that user group will also have read access to the HDFS files that are part of that table. When you list those files in HDFS, this permission will be listed as an HDFS ACL.
Note that when Sentry was enabled, the hive user/group was given ownership of all files/directories in the Hive warehouse (/user/hive/warehouse). Hence, the resulting synchronized Sentry permissions will reflect this fact.
The mapping of Sentry privileges to HDFS ACLs is as follows:
- SELECT privilege -> Read access on the file.
- INSERT privilege -> Write access on the file.
- ALL privilege -> Read and Write access on the file.
Prompting HDFS ACL Changes
URIs do not have an impact on the HDFS-Sentry plugin. Therefore, you cannot manage all of your HDFS ACLs with the HDFS-Sentry plugin and you must continue to use standard HDFS ACLs for data outside of Hive.
- Hive DATABASE object LOCATION (HDFS) when a role is granted to the object
- Hive TABLE object LOCATION (HDFS) when a role is granted to the object
- Hive URI LOCATION (HDFS) when a role is granted to a URI
- Hive SERVER object when a role is granted to the object. HDFS ACLs are not updated if a role is assigned to the SERVER. The privileges are inherited by child objects in standard Sentry interactions, but the plugin does not trickle the privileges down.
- Permissions granted on views. Views are not synchronized as objects in the HDFS file system.
Prerequisites
- CDH 5.3.0 (or higher)
- (Strongly Recommended) Implement Kerberos authentication on your cluster.
- You must use the Sentry service, not policy file-based authorization.
- Enabling HDFS Extended Access Control Lists (ACLs) is required.
- There must be exactly one Sentry service dependent on HDFS.
- The Sentry service must have exactly one Sentry Server role.
- The Sentry service must have exactly one dependent Hive service.
- The Hive service must have exactly one Hive Metastore role (that is, High Availability should not be enabled).
Enabling the HDFS-Sentry Plugin Using Cloudera Manager
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Type Check HDFS Permissions in the Search box.
- Select Check HDFS Permissions.
- Select Enable Sentry Synchronization.
- Locate the Sentry Synchronization Path Prefixes property or search for it by typing its name in the Search box.
- Edit the Sentry Synchronization Path Prefixes property to list HDFS path prefixes where Sentry permissions should be enforced. Multiple HDFS path prefixes can be specified. By default, this property points to user/hive/warehouse and must always be non-empty. HDFS privilege synchronization will not occur for tables located outside the HDFS regions listed here.
- Click Save Changes.
- Restart the cluster. Note that it may take an additional two minutes after cluster restart for privilege synchronization to take effect.
Enabling the HDFS-Sentry Plugin Using the Command Line
To enable the Sentry plugins on an unmanaged cluster, you must explicitly allow the hdfs user to interact with Sentry, and install the plugin packages as described in the following sections.
Installing the HDFS-Sentry Plugin
- The host running the NameNode and Secondary NameNode
- The host running the Hive Metastore
- The host running the Sentry Service
OS | Command |
---|---|
RHEL-compatible |
$ sudo yum install sentry-hdfs-plugin |
SLES |
$ sudo zypper install sentry-hdfs-plugin |
Ubuntu or Debian |
$ sudo apt-get install sentry-hdfs-plugin |
Configuring the HDFS NameNode Plugin
<property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.authorization.provider.class</name> <value>org.apache.sentry.hdfs.SentryAuthorizationProvider</value> </property> <property> <name>dfs.permissions</name> <value>true</value> </property> <!-- Comma-separated list of HDFS path prefixes where Sentry permissions should be enforced. --> <!-- Privilege synchronization will occur only for tables located in HDFS regions specified here. --> <property> <name>sentry.authorization-provider.hdfs-path-prefixes</name> <value>/user/hive/warehouse</value> </property> <property> <name>sentry.hdfs.service.security.mode</name> <value>kerberos</value> </property> <property> <name>sentry.hdfs.service.server.principal</name> <value> SENTRY_SERVER_PRINCIPAL (for eg : sentry/_HOST@VPC.CLOUDERA.COM )</value> </property> <property> <name>sentry.hdfs.service.client.server.rpc-port</name> <value>SENTRY_SERVER_PORT</value> </property> <property> <name>sentry.hdfs.service.client.server.rpc-address</name> <value>SENTRY_SERVER_HOST</value> </property>
Configuring the Hive Metastore Plugin
<property> <name>sentry.metastore.plugins</name> <value>org.apache.sentry.hdfs.MetastorePlugin</value> </property> <property> <name>sentry.hdfs.service.client.server.rpc-port</name> <value> SENTRY_SERVER_PORT </value> </property> <property> <name>sentry.hdfs.service.client.server.rpc-address</name> <value> SENTRY_SERVER_HOSTNAME </value> </property> <property> <name>sentry.hdfs.service.client.server.rpc-connection-timeout</name> <value>200000</value> </property> <property> <name>sentry.hdfs.service.security.mode</name> <value>kerberos</value> </property> <property> <name>sentry.hdfs.service.server.principal</name> <value> SENTRY_SERVER_PRINCIPAL (for eg : sentry/_HOST@VPC.CLOUDERA.COM )</value> </property>
Configuring the Sentry Service Plugin
<property> <name>sentry.service.processor.factories</name> <value>org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessorFactory, org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory</value> </property> <property> <name>sentry.policy.store.plugins</name> <value>org.apache.sentry.hdfs.SentryPlugin</value> </property>
Testing the Sentry Synchronization Plugins
The following tasks should help you make sure that Sentry-HDFS synchronization has been enabled and configured correctly:
- (Recommended) Hue's Security application
- HiveServer2 CLI
- Impala CLI
- Access the table files directly in HDFS. For example:
- List files inside the folder and verify that the file permissions shown in HDFS (including ACLs) match what was configured in Sentry.
- Run a MapReduce, Pig or Spark job that accesses those files. Pick any tool besides HiveServer2 and Impala