If the cluster has Sentry enabled and you are using Replication Manager to replicate
files or tables and their permissions, configuration changes to HDFS are required.
The configuration changes are required due to how HDFS manages ACLs.
When a user reads ACLs, HDFS provides the ACLs configured in the
External Authorization Provider, which is Sentry. If Sentry is not
available or it does not manage authorization of the particular
resource, such as the file or directory, then HDFS falls back to its own
internal ACLs. But when ACLs are written to HDFS, HDFS always writes
these internal ACLs even when Sentry is configured. This causes HDFS
metadata to be polluted with Sentry ACLs. It can also cause a
replication failure in replication when Sentry ACLs are not compatible
with HDFS ACLs.
To prevent issues with HDFS and Sentry ACLs, complete the following
steps:
Create a user account that is only used for Replication Manager jobs since Sentry ACLs
will be bypassed for this user.
For example, create a user named bdr-only-user.
Configure HDFS on the source cluster:
In the Cloudera Manager Admin Console, select Clusters > <HDFS
service>.
Select Configuration and search for the following property: NameNode
Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
Add the following property:
Name: Use the following property name:
dfs.namenode.inode.attributes.provider.bypass.users
Value: Provide the following information: <username>,
<username>@<RealmName>
Replace <username> with the user you created in step 1 and <RealmName>
with the name of the Kerberos realm.
For example, the user bdr-only-user on the realm
elephant requires the following value:
bdr-only-user, bdr-only-user@ElephantRealm
Description: This field is optional.
Restart the NameNode.
Restart the NameNode.
Repeat step 2 on the destination cluster.
When you create a replication policy, specify the user you created in step 1 in the
Run As Username and Run on Peer as Username (if available) fields.