Controlling the Number of Mappers and Their Bandwidth
If you want to control the number of mappers launched for DistCp, you can add the
-m
option and set it to the desired number of mappers.
When using DistCp from a Hadoop cluster running in Amazon's infrastructure, increasing the number of mappers may speed up the operation.
Similarly, if copying to S3 from a cluster in a different region, it is possible that the bandwidth from the Hadoop cluster to Amazon S3 is the bottleneck. In such a situation, because the bandwidth is shared across all mappers, adding more mappers will not accelerate the upload: it will merely slow all the mappers down.
The -bandwidth
option sets the approximate maximum bandwidth for each
mapper in Megabytes per second. This a floating point number, so a value such as
-bandwidth 0.5
allocates 0.5 MB/s to each mapper.