Cloud Data Access
Also available as:
PDF
loading table of contents...

Controlling the Number of Mappers and Their Bandwidth

If you want to control the number of mappers launched for DistCp, you can add the -m option and set it to the desired number of mappers.

When using DistCp from a Hadoop cluster running in Amazon's infrastructure, increasing the number of mappers may speed up the operation.

Similarly, if copying to S3 from a cluster in a different region, it is possible that the bandwidth from the Hadoop cluster to Amazon S3 is the bottleneck. In such a situation, because the bandwidth is shared across all mappers, adding more mappers will not accelerate the upload: it will merely slow all the mappers down.

The -bandwidth option sets the approximate maximum bandwidth for each mapper in Megabytes per second. This a floating point number, so a value such as -bandwidth 0.5 allocates 0.5 MB/s to each mapper.