To copy data between HA clusters, use the dfs.internal.nameservices
property
in the hdfs-site.xml
file to explicitly specify the name services belonging to
the local cluster, while continuing to use the dfs.nameservices
property to
specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Edit the HDFS Client Advanced Configuration Snippet (Safety Valve) for
hdfs-site.xml
for both cluster A and cluster B:
-
Open the Cloudera Manager Admin Console.
-
Go to the HDFS service.
-
Click the Configuration tab.
-
Select .
-
Select .
-
Search for
HDFS Client Advanced Configuration Snippet (Safety Valve) for
hdfs-site.xml
, and add the various properties as specified:
-
Add both name services to
dfs.nameservices
= HAA, HAB
-
Add the
dfs.internal.nameservices
property:
-
Add
dfs.ha.namenodes.<nameservice>
to both clusters:
-
In cluster A
dfs.ha.namenodes.HAB = nn1,nn2
-
In cluster B
dfs.ha.namenodes.HAA = nn1,nn2
-
Add the
dfs.namenode.rpc-address.<cluster>.<nn>
property:
-
In Cluster A:
dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020
dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020
-
In Cluster B:
dfs.namenode.rpc-address.HAA.nn1 = <NN1_fqdn>:8020
dfs.namenode.rpc-address.HAA.nn2 = <NN2_fqdn>:8020
-
Add the following properties to enable
distcp
over WebHDFS and
secure WebHDFS:
-
In Cluster A:
dfs.namenode.http-address.HAB.nn1 = <NN1_fqdn>:50070
dfs.namenode.http-address.HAB.nn2 = <NN2_fqdn>:50070
dfs.namenode.https-address.HAB.nn1 = <NN1_fqdn>:50470
dfs.namenode.https-address.HAB.nn2 = <NN2_fqdn>:50470
-
In Cluster B:
dfs.namenode.http-address.HAA.nn1 = <NN1_fqdn>:50070
dfs.namenode.http-address.HAA.nn2 = <NN2_fqdn>:50070
dfs.namenode.https-address.HAA.nn1 = <NN1_fqdn>:50470
dfs.namenode.https-address.HAA.nn2 = <NN2_fqdn>:50470
-
Add the
dfs.client.failover.proxy.provider.<cluster>
property:
-
In cluster A:
dfs.client.failover.proxy.provider. HAB =
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
-
In cluster B:
dfs.client.failover.proxy.provider. HAA =
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
-
Restart the HDFS service, then run the
distcp
command using the
NameService. For example:
hadoop distcp hdfs://HAA/tmp/testDistcp hdfs://HAB/tmp/