Using DistCp
The distributed copy command, distcp
, is a general utility for copying large data
sets between distributed filesystems within and across clusters. You can also use
distcp
to copy data to and from an Amazon S3 bucket. The
distcp
command submits a regular MapReduce job that performs a
file-by-file copy.
To see the
distcp
command options, run the built-in help:
hadoop distcp