4. DistCp and Security Settings

Security settings dictate whether DistCp should be run on the source cluster or the destination cluster. The general rule-of-thumb is that if one cluster is secure and the other is not secure, DistCp should be run from the secure cluster -- otherwise there may be security-related issues.

When copying data from a secure cluster to an non-secure cluster, the following configuration setting is required for the DistCp client:

<property>
  <name>ipc.client.fallback-to-simple-auth-allowed</name>
  <value>true</value>
</property>

When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the core-site.xml file:

<property>
  <name>hadoop.security.auth_to_local</name>
  <value></value>
  <description>Maps kerberos principals to local user names</description>
</property> 

Secure-to-Secure: Kerberos Principal Name

  • distcp hdfs://hdp-2.0-secure hdfs://hdp-2.0-secure

    One issue here is that the SASL RPC client requires that the remote server’s Kerberos principal must match the server principal in its own configuration. Therefore, the same principal name must be assigned to the applicable NameNodes in the source and the destination cluster. For example, if the Kerberos principal name of the NameNode in the source cluster is nn/host1@realm, the Kerberos principal name of the NameNode in destination cluster must be nn/host2@realm, rather than nn2/host2@realm, for example.

Secure-to-Secure: ResourceManager Mapping Rules

When copying between two HDP2 secure clusters, or from HDP1 secure to HDP2 secure, further ResourceManager (RM) configuration is required if the two clusters have different realms. In order for DistCP to succeed, the same RM mapping rule must be used in both clusters.

For example, if secure Cluster 1 has the following RM mapping rule:

<property>
      <name>hadoop.security.auth_to_local</name>
      <value>
RULE:[2:$1@$0](rm@.*SEC1.SUP1.COM)s/.*/yarn/
DEFAULT
       </value>
</property>

And secure Cluster 2 has the following RM mapping rule:

<property>
      <name>hadoop.security.auth_to_local</name>
      <value>
RULE:[2:$1@$0](rm@.*BA.YISEC3.COM)s/.*/yarn/
DEFAULT
      </value>
</property>

The DistCp job from Cluster 1 to Cluster 2 will fail because Cluster 2 cannot resolve the RM principle of Cluster 1 correctly to the yarn user, because the RM mapping rule in Cluster 2 is different than the RM mapping rule in Cluster 1.

The solution is to use the same RM mapping rule in both Cluster 1 and Cluster 2:

<property>
      <name>hadoop.security.auth_to_local</name>
      <value>
RULE:[2:$1@$0](rm@.*SEC1.SUP1.COM)s/.*/yarn/
RULE:[2:$1@$0](rm@.*BA.YISEC3.COM)s/.*/yarn/
DEFAULT</value>
    </property>


loading table of contents...