Using DistCp
The distributed copy command, distcp, is a general utility for copying large data
sets between distributed filesystems within and across clusters. You can also use
distcp to copy data to and from an Amazon S3 bucket. The
distcp command submits a regular MapReduce job that performs a
file-by-file copy.
To see the
distcp command options, run the built-in help:
hadoop distcp