Security settings dictate whether DistCp should be run on the source cluster or the destination cluster. The general rule-of-thumb is that if one cluster is secure and the other is not secure, DistCp should be run from the secure cluster -- otherwise there may be security-related issues.
When copying data from a secure cluster to an non-secure cluster, the following configuration setting is required for the DistCp client:
<property> <name>ipc.client.fallback-to-simple-auth-allowed</name> <value>true</value> </property>
When copying data from a secure cluster to a secure cluster, the following
configuration setting is required in the core-site.xml
file:
<property> <name>hadoop.security.auth_to_local</name> <value></value> <description>Maps kerberos principals to local user names</description> </property>
Secure-to-Secure: Kerberos Principal Name
distcp hdfs://hdp-2.0-secure hdfs://hdp-2.0-secure
One issue here is that the SASL RPC client requires that the remote server’s Kerberos principal must match the server principal in its own configuration. Therefore, the same principal name must be assigned to the applicable NameNodes in the source and the destination cluster. For example, if the Kerberos principal name of the NameNode in the source cluster is
nn/host1@realm
, the Kerberos principal name of the NameNode in destination cluster must benn/host2@realm
, rather thannn2/host2@realm
, for example.
Secure-to-Secure: ResourceManager Mapping Rules
When copying between two HDP2 secure clusters, or from HDP1 secure to HDP2 secure, further ResourceManager (RM) configuration is required if the two clusters have different realms. In order for DistCP to succeed, the same RM mapping rule must be used in both clusters.
For example, if secure Cluster 1 has the following RM mapping rule:
<property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1@$0](rm@.*SEC1.SUP1.COM)s/.*/yarn/ DEFAULT </value> </property>
And secure Cluster 2 has the following RM mapping rule:
<property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1@$0](rm@.*BA.YISEC3.COM)s/.*/yarn/ DEFAULT </value> </property>
The DistCp job from Cluster 1 to Cluster 2 will fail because Cluster 2 cannot resolve the RM principle of Cluster 1 correctly to the yarn user, because the RM mapping rule in Cluster 2 is different than the RM mapping rule in Cluster 1.
The solution is to use the same RM mapping rule in both Cluster 1 and Cluster 2:
<property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1@$0](rm@.*SEC1.SUP1.COM)s/.*/yarn/ RULE:[2:$1@$0](rm@.*BA.YISEC3.COM)s/.*/yarn/ DEFAULT</value> </property>