When you run a HDFS replication policy on a Sentry-enabled source cluster, the
replication policy can copy files and tables along with their permissions.
Cloudera Manager version 6.3.1 and above is required to run HDFS replication policies on a
Sentry-enabled source cluster.
When you want to run HDFS replication policies on
a source cluster that is Sentry-enabled, you must use the hdfs user. If you want to
use a different user account, you must configure the user account to bypass the Sentry ACLs
during the replication process.
When Sentry is not available or when Sentry
does not manage the authorization for a resource such file or directory in the source
cluster, HDFS uses its internal ACLs to manage resource authorization.
When
Sentry is enabled for the source cluster and you use the hdfs user name to run the
HDFS replication policy, HDFS copies the ACLs configured in Sentry for the replicated files
and tables to the target cluster.
When Sentry is enabled and you use a
different user name to run the HDFS replication policy, both Sentry ACLs and HDFS internal
ACLs are copied which results in incorrect HDFS metadata in the target cluster. If the
Sentry ACLs are not compatible with HDFS ACLs, the replication job fails.
To
avoid compatibility issues between HDFS and Sentry ACLs for a non-hdfs user, you must
complete the following steps:
Create a user account that is only used for Replication Manager jobs since Sentry ACLs
will be bypassed for this user.
For example, create a user named bdr-only-user.
To bypass the Sentry ACLs during replication, perform the following steps on the source
cluster:
In the Cloudera Manager Admin Console, select Clusters > HDFS service.
Select Configuration and search for NameNode
Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
property.
Add the following property:
Name -
dfs.namenode.inode.attributes.provider.bypass.users
Value
- Enter [***USERNAME, USERNAME@REALMNAME***],
where [***USERNAME***] is the user you created in step 1 and the [***REALMNAME***]
is the Kerberos realm name.
For example, if the username is
bdr-only-user on the realm elephant, enter bdr-only-user,
bdr-only-user@ElephantRealm
Restart the NameNode.
Repeat step 2 on the destination cluster.
When you create a HDFS replication policy, specify the user you created in step 1 in
the Run As Username and Run on Peer as Username (if available) fields.