When you run a HDFS replication policy on a Sentry-enabled source cluster, the
replication policy copies files and tables along with their permissions. Cloudera Manager
version 6.3.1 and above is required to run HDFS replication policies on a Sentry-enabled source
cluster.
When you want to run HDFS replication policies on a source cluster that is
Sentry-enabled, you must use the hdfs user. If you want to use a different user
account, you must configure the user account to bypass the Sentry ACLs during the
replication process.
When Sentry is not available or when Sentry does not manage the authorization for
a resource such file or directory in the source cluster, HDFS uses its internal ACLs to
manage resource authorization.
When Sentry is enabled for the source cluster and you use the hdfs user
name to run the HDFS replication policy, HDFS copies the ACLs configured in Sentry for the
replicated files and tables to the target cluster.
When Sentry is enabled and you use a different user name to run the HDFS
replication policy, both Sentry ACLs and HDFS internal ACLs are copied which results in
incorrect HDFS metadata in the target cluster. If the Sentry ACLs are not compatible with
HDFS ACLs, the replication job fails.
To avoid compatibility issues between HDFS and Sentry ACLs for a non-hdfs
user, you must complete the following steps:
Create a user account that is only used for Replication Manager jobs since Sentry ACLs
will be bypassed for this user.
For example, create a user named bdr-only-user.
To bypass the Sentry ACLs during replication, perform the following steps on the source
cluster:
In the Cloudera Manager Admin Console, select Clusters > HDFS service.
Select Configuration and search for NameNode
Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
property.
Add the following property:
Name -
dfs.namenode.inode.attributes.provider.bypass.users
Value
- Enter [***USERNAME, USERNAME@REALMNAME***],
where [***USERNAME***] is the user you created in step 1 and the
[***REALMNAME***] is the Kerberos realm name.
For
example, if the username is bdr-only-user on the realm elephant, enter
bdr-only-user, bdr-only-user@ElephantRealm
Restart the NameNode.
Repeat step 2 on the destination cluster.
When you create a HDFS replication policy, specify the user you created in step 1 in
the Run As Username and (if available) fields.