Moving the HDFS data using the distcp command
Use the hadoop distcp
command to move the content from the HDFS
source cluster.
distcp
command:- Ensure that the
distcp
user can run a MapReduce job on YARN. Otherwise, you must tweak the following configurations to enable thedistcp
user:- allowed.system.users
- banned.users
- min.user.id
- For source directories with a very high file count, consider creating a manual
copy listing, as specified in the following
example.
> hdfs dfs -ls hdfs://<hdfs-nameservice>/user/john.doe/application1/* > src_files
The copy listing output file can be read, and submitted as input one by one to a
distcp
job.
Considering the example of a user
john.doe
whose data from the
/user/john.doe/application1/
directory you want to transfer
to Ozone, run the distcp
command as specified.
> hadoop distcp -direct hdfs://<hdfs-nameservice>/user/john.doe/application1 ofs://<ozone.service.id>/user/john.doe/