HDFS replication in Sentry-enabled clusters

When you run an HDFS replication policy on a Sentry-enabled source cluster, the replication policy copies files and tables along with their permissions. Cloudera Manager version 6.3.1 and above is required to run HDFS replication policies on a Sentry-enabled source cluster.

To perform Sentry to Ranger replication using HDFS replication policies, you must have installed Cloudera Manager version 6.3.1 and higher on the source cluster and Cloudera Manager version 7.1.1 and higher on the target cluster. Use the hdfs user to run HDFS replication policies on a source cluster that is Sentry-enabled. To use a different user account, you must configure the user account to bypass the Sentry ACLs during the replication process.

Consider the following points before you create an HDFS replication policy:
  • When Sentry is not available or when Sentry does not manage the authorization for a resource such file or directory in the source cluster, HDFS uses its internal ACLs to manage resource authorization.
  • When Sentry is enabled for the source cluster and you use the hdfs user to create the HDFS replication policy, HDFS copies the ACLs configured in Sentry for the replicated files and tables to the target cluster.
  • When Sentry is enabled and you use a different user name to run the HDFS replication policy, both Sentry ACLs and HDFS internal ACLs are copied which results in incorrect HDFS metadata in the target cluster. If the Sentry ACLs are not compatible with HDFS ACLs, the replication job fails. Create another user to bypass the Sentry ACLs during the replication process to avoid such compatibility issues.

To avoid compatibility issues between HDFS and Sentry ACLs for a non-hdfs user, you must complete the following steps:

  1. Create a user account that Replication Manager jobs can use to bypass the Sentry ACLs.
    For example, create a user named bdr-only-user.
  2. Perform the following steps on the source cluster:
    1. In the Cloudera Manager Admin Console, go to the Clusters > HDFS service > Configuration tab.
    2. Search for NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml property.
    3. Enter the following property details:
      Name - Enter dfs.namenode.inode.attributes.provider.bypass.users.

      Value - Enter [***USERNAME, USERNAME@REALMNAME***], where [***USERNAME***] is the user you created in step 1 and the [***REALMNAME***] is the Kerberos realm name.

      For example, if the username is bdr-only-user on the realm elephant, enter bdr-only-user, bdr-only-user@ElephantRealm

    4. Restart the NameNode.
  3. Repeat step 2 on the destination cluster.
  4. When you create an HDFS replication policy, specify the user you created in step 1 in the Run As Username and Run on Peer as Username fields.