HDFS replication from Sentry-enabled clusters

When you run a HDFS replication policy on a Sentry-enabled source cluster, the replication policy copies files and tables along with their permissions.

Cloudera Manager version 6.3.1 and above is required to run HDFS replication policies on a Sentry-enabled source cluster.

When you want to run HDFS replication policies on a source cluster that is Sentry-enabled, you must use the hdfs user. If you want to use a different user account, you must configure the user account to bypass the Sentry ACLs during the replication process.

When Sentry is not available or when Sentry does not manage the authorization for a resource such file or directory in the source cluster, HDFS uses its internal ACLs to manage resource authorization.

When Sentry is enabled for the source cluster and you use the hdfs user name to run the HDFS replication policy, HDFS copies the ACLs configured in Sentry for the replicated files and tables to the target cluster.

When Sentry is enabled and you use a different user name to run the HDFS replication policy, both Sentry ACLs and HDFS internal ACLs are copied which results in incorrect HDFS metadata in the target cluster. If the Sentry ACLs are not compatible with HDFS ACLs, the replication job fails.

To avoid compatibility issues between HDFS and Sentry ACLs for a non-hdfs user, you must complete the following steps:

  1. Create a user account that is only used for Replication Manager jobs since Sentry ACLs will be bypassed for this user.
    For example, create a user named bdr-only-user.
  2. To bypass the Sentry ACLs during replication, perform the following steps on the source cluster:
    1. In the Cloudera Manager Admin Console, select Clusters > HDFS service.
    2. Select Configuration and search for NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml property.
    3. Add the following property:
      Name - dfs.namenode.inode.attributes.provider.bypass.users

      Value - Enter [***USERNAME, USERNAME@REALMNAME***], where [***USERNAME***] is the user you created in step 1 and the [***REALMNAME***] is the Kerberos realm name.

      For example, if the username is bdr-only-user on the realm elephant, enter bdr-only-user, bdr-only-user@ElephantRealm

    4. Restart the NameNode.
  3. Repeat step 2 on the destination cluster.
  4. When you create a HDFS replication policy, specify the user you created in step 1 in the Run As Username and Run on Peer as Username (if available) fields.