Set HADOOP_CONF to the destination cluster
Set the HADOOP_CONF
path to be the destination environment. If you
are not using HFTP, set the HADOOP_CONF
path to the source environment
instead.
Alternatively, you can point the
hadoop distcp
client to a configuration
file that includes the parameters that allow it to point to the destination cluster.
This can be done on the command line with the --config
option. That
option must be pointed to a new copy of the contents of the
/etc/hadoop/conf
directory with the following parameters from the
destination /remote
cluster added to the hdfs-site.xml
file:dfs.ha.namenodes.<nameserviceID>
dfs.client.failover.proxy.provider.<remote nameserviceID>
dfs.ha.automatic-failover.enabled.<remote nameserviceID>
dfs.namenode.rpc-address.<nameserviceID>.<namenode1>
dfs.namenode.servicerpc-address.<nameserviceID>.<namenode1>
dfs.namenode.http-address.<nameserviceID>.<namenode1>
dfs.namenode.https-address.<nameserviceID>.<namenode1>
dfs.namenode.rpc-address.<nameserviceID>.<namenode2>
dfs.namenode.servicerpc-address.<nameserviceID>.<namenode2>
dfs.namenode.http-address.<nameserviceID>.<namenode2>
dfs.namenode.https-address.<nameserviceID>.<namenode2>
These parameters can be found in the /etc/hadoop/conf/hdfs-site.xml
path
on the remote cluster.
For example, the command would look something like the
following:
hadoop --config distcpConf distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=<nameservice> hdfs://<nameservice>/<source_directory> <target directory>
For more details, see Using DistCp with Highly Available remote clusters.