To copy data between HA clusters using distcp
, you must configure
specific name service properties to ensure that the HDFS clients in a cluster can access a
remote HA cluster. You can configure these values through advanced configuration snippets in
Cloudera Manager.
This procedure explains how you can configure the name service properties from
Cloudera Manager to enable copying of data between two example clusters A and B.
Here, A is the source cluster while B is the remote cluster.
-
Select Clusters and choose the source HDFS cluster where
you want to configure the properties.
-
Navigate to the Configuration tab and search for
HDFS Client Advanced Configuration Snippet (Safety Valve) for
hdfs-site.xml.
-
Add the various properties as specified:
-
Add the
dfs.internal.nameservices
property as the name
service of the source cluster.
dfs.internal.nameservices = HAA
-
Add the
dfs.nameservices
property as the name services
of both the clusters.
dfs.nameservices = HAA,HAB
-
Add the
dfs.client.failover.proxy.provider.<nameservice>
property for the remote cluster.
dfs.client.failover.proxy.provider.HAB =
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
-
Add the
dfs.ha.namenodes.<nameservice>
property to
reference the NameNodes of the remote cluster.
dfs.ha.namenodes.HAB = nn1,nn2
-
Add the
dfs.namenode.rpc-address
and the
dfs.namenode.servicerpc-address
properties as the
fully qualified RPC addresses on which each NameNode of the remote
cluster can listen.
dfs.namenode.rpc-address.HAB.nn1 =
<NN1_fqdn>:8020
dfs.namenode.rpc-address.HAB.nn2
=
<NN2_fqdn>:8020
dfs.namenode.servicerpc-address.HAB.nn1
=
<NN1_fqdn>:8022
dfs.namenode.servicerpc-address.HAB.nn2
= <NN2_fqdn>:8022
-
Add the
dfs.namenode.http-address
and the
dfs.namenode.https-address
properties as the fully
qualified HTTP and HTTPS addresses on which each NameNode of the remote
cluster can listen.
dfs.namenode.http-address.HAB.nn1 =
<NN1_fqdn>:9870
dfs.namenode.http-address.HAB.nn2
=
<NN2_fqdn>:9870
dfs.namenode.https-address.HAB.nn1
=
<NN1_fqdn>:9871
dfs.namenode.https-address.HAB.nn2
= <NN2_fqdn>:9871
-
Specify the Reason for Change, and click Save
Changes.
This saves the property values as client-side configurations.
-
Restart the HDFS service, then run the
distcp
command using
the NameService. For example:
hadoop distcp hdfs://HAA/tmp/testDistcp hdfs://HAB/tmp/