Replication with Sentry Enabled

If the cluster has Sentry enabled and you are using Replication Manager to replicate files or tables and their permissions, configuration changes to HDFS are required.

The configuration changes are required due to how HDFS manages ACLs. When a user reads ACLs, HDFS provides the ACLs configured in the External Authorization Provider, which is Sentry. If Sentry is not available or it does not manage authorization of the particular resource, such as the file or directory, then HDFS falls back to its own internal ACLs. But when ACLs are written to HDFS, HDFS always writes these internal ACLs even when Sentry is configured. This causes HDFS metadata to be polluted with Sentry ACLs. It can also cause a replication failure in replication when Sentry ACLs are not compatible with HDFS ACLs.

To prevent issues with HDFS and Sentry ACLs, complete the following steps:

  1. Create a user account that is only used for Replication Manager jobs since Sentry ACLs will be bypassed for this user.
    For example, create a user named bdr-only-user.
  2. Configure HDFS on the source cluster:
    1. In the Cloudera Manager Admin Console, select Clusters > <HDFS service>.
    2. Select Configuration and search for the following property: NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
    3. Add the following property: Name: Use the following property name: dfs.namenode.inode.attributes.provider.bypass.users Value: Provide the following information: <username>, <username>@<RealmName> Replace <username> with the user you created in step 1 and <RealmName> with the name of the Kerberos realm.

      For example, the user bdr-only-user on the realm elephant requires the following value:

      bdr-only-user, bdr-only-user@ElephantRealm

      Description: This field is optional.

      Restart the NameNode.

    4. Restart the NameNode.
  3. Repeat step 2 on the destination cluster.
  4. When you create a replication policy, specify the user you created in step 1 in the Run As Username and Run on Peer as Username (if available) fields.