Using DistCp between HA clusters using Cloudera Manager
To copy data between HA clusters using distcp, you must configure
specific name service properties to ensure that the HDFS clients in a cluster can access a
remote HA cluster. You can configure these values through advanced configuration snippets in
Cloudera Manager.
This procedure explains how you can configure the name service properties from
Cloudera Manager to enable copying of data between two example clusters A and B.
Here, A is the source cluster while B is the remote cluster.
Select Clusters and choose the source HDFS cluster where
you want to configure the properties.
Navigate to the Configuration tab and search for
HDFS Client Advanced Configuration Snippet (Safety Valve) for
hdfs-site.xml.
Add the various properties as specified:
Add the dfs.internal.nameservices property as the name
service of the source cluster.
dfs.internal.nameservices = HAA
Add the dfs.nameservices property as the name services
of both the clusters.
dfs.nameservices = HAA,HAB
Add the
dfs.client.failover.proxy.provider.<nameservice>
property for the remote cluster.
Add the dfs.ha.namenodes.<nameservice> property to
reference the NameNodes of the remote cluster.
dfs.ha.namenodes.HAB = nn1,nn2
Add the dfs.namenode.rpc-address and the
dfs.namenode.servicerpc-address properties as the
fully qualified RPC addresses on which each NameNode of the remote
cluster can listen.
Add the dfs.namenode.http-address and the
dfs.namenode.https-address properties as the fully
qualified HTTP and HTTPS addresses on which each NameNode of the remote
cluster can listen.