DistCp Between HA Clusters
To copy data between HA clusters, use the dfs.internal.nameservices
property
in the hdfs-site.xml
file to explicitly specify the name services belonging to
the local cluster, while continuing to use the dfs.nameservices
property to
specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Create a new directory and copy the contents of the
/etc/hadoop/conf
directory on the local cluster to this directory. The local cluster is the cluster where you plan to run the distcp command.The following steps use
distcpConf
as the directory name. Substitute the name of the directory you created fordistcpConf
.In the
hdfs-site.xml
file in thedistcpConf
directory, add the nameservice ID for the remote cluster to thedfs.nameservices
property.Note localns
is the nameservice ID of the local cluster andexternalns
is the namespace ID of the remote cluster.<property> <name>dfs.nameservices</name> <value>localns, externalns </value> </property> <property> <name>dfs.internal.nameservices</name> <value>localns</value> </property>
On the remote cluster, find the
hdfs-site.xml
file and copy the properties that refer to the nameservice ID to the end of thehdfs-site.xml
file in thedistcpConf
directory you created in step 1:dfs.ha.namenodes.<nameserviceID>
dfs.namenode.rpc-address.<nameserviceID>.<namenode1>
dfs.namenode.servicerpc-address.<nameserviceID>.<namenode1>
dfs.namenode.http-address.<nameserviceID>.<namenode1>
dfs.namenode.https-address.<nameserviceID>.<namenode1>
dfs.namenode.rpc-address.<nameserviceID>.<namenode2>
dfs.namenode.servicerpc-address.<nameserviceID>.<namenode2>
dfs.namenode.http-address.<nameserviceID>.<namenode2>
dfs.namenode.https-address.<nameserviceID>.<namenode2>
Enter the following command to copy data from the remote cluster to the local cluster:
hadoop --config distcpConf distcp hdfs://externalns/<source_directory> hdfs://localns/<destination_directory>
If you want to perform disctcp on a secure cluster, you must also pass the
mapreduce.job.send-token-conf
property along with distcp command, as follows:Hadoop –config distcpConf -Dmapreduce.job.send-token-conf="yarn.http.policy|^yarn.timeline-service.webapp. *$|^yarn.timeline-service.client.*$|hadoop.security.key.provider.path|hadoop.rpc.protection|dfs.nameservices| ^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider. *$|dfs.namenode.kerberos.principal|dfs.namenode.kerberos.principal.pattern|mapreduce.jobhistory.principal" hdfs://externalns/<source_directory> hdfs://localns/<destination_directory>