Moving the HDFS data using the distcp command

Use the hadoop distcp command to move the content from the HDFS source cluster.

You must consider the following before running the distcp command:
  • Ensure that the distcp user can run a MapReduce job on YARN. Otherwise, you must tweak the following configurations to enable the distcp user:
    • allowed.system.users
    • banned.users
    • min.user.id
  • For source directories with a very high file count, consider creating a manual copy listing, as specified in the following example.
    > hdfs dfs -ls hdfs://<hdfs-nameservice>/user/john.doe/application1/* > src_files
    
    

    The copy listing output file can be read, and submitted as input one by one to a distcp job.

Considering the example of a user john.doe whose data from the /user/john.doe/application1/ directory you want to transfer to Ozone, run the distcp command as specified.
> hadoop distcp -direct hdfs://<hdfs-nameservice>/user/john.doe/application1 ofs://<ozone.service.id>/user/john.doe/