Managing Data StoragePDF version

Using DistCp

The distributed copy command, distcp, is a general utility for copying large data sets between distributed filesystems within and across clusters. You can also use distcp to copy data to and from an Amazon S3 bucket. The distcp command submits a regular MapReduce job that performs a file-by-file copy.

To see the distcp command options, run the built-in help:
hadoop distcp

We want your opinion

How can we improve this page?

What kind of feedback do you have?