Distcp syntax and examples
You can use distcp for copying data between CDP clusters. In
        addition, you can also use it to copy data between a CDP cluster and Amazon S3 or Azure Data
        Lake Storage Gen 2.
Common use of distcp
The most common use of distcp is an inter-cluster copy:
hadoop distcp hdfs://nn1:8020/source hdfs://nn2:8020/destination
Where hdfs://nn1:8020/source is the data source, and
                    hdfs://nn2:8020/destination is the destination. This will
                expand the name space under /source on NameNode "nn1" into a temporary file,
                partition its contents among a set of map tasks, and start copying from "nn1" to
                "nn2". Note that DistCp requires absolute paths.
You can also specify multiple source directories:
hadoop distcp hdfs://nn1:8020/source/a hdfs://nn1:8020/source/b hdfs://nn2:8020/destination
Or specify multiple source directories from a file with the -f
                option:
hadoop distcp -f hdfs://nn1:8020/srclist hdfs://nn2:8020/destination
Where srclist contains:
hdfs://nn1:8020/source/a hdfs://nn1:8020/source/b
Copying between major versions
Run the distcp command on the cluster that runs the higher version
                of CDP, which should be the destination cluster. Use the following syntax:
hadoop distcp webhdfs://<namenode>:<port> hdfs://<namenode>
            Note the webhdfs prefix for the remote cluster, which should be your
                source cluster. You must use webhdfs when the clusters run
                different major versions. When clusters run the same version, you can use the
                    hdfs protocol for better performance.
For example, the following command copies data from a CDP source cluster named
                    example-source to another CDP version destination cluster named
                    example-dest:
hadoop distcp webhdfs://example-source.cloudera.com:8020 hdfs://example-dest.cloudera.com
        Copying to/from Amazon S3
The following syntax for distcp shows how to copy data to/from
                S3:
#Copying from S3
hadoop distcp s3a://<bucket>/<data> hdfs://<namenode>/<directory>/
#Copying to S3
hadoop distcp hdfs://<namenode>/<directory> s3a://<bucket>/<data>
            This is a basic example of using distcp with S3.
Copying to/from ADLS Gen 2
The following syntax for distcp shows how to copy data to/from ADLS
                Gen 2:
#Copying from ABFS 
hadoop distcp abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name> hdfs://hdfs_destination_path
#Copying to ADLS Gen2
hadoop distcp hdfs://hdfs_destination_path abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name> 
            