Improving Performance for DistCp
ADLS and WASB
You can tune fs.azure.selfthrottling.read.factor
and
fs.azure.selfthrottling.write.factor
. Refer to Maximizing HDInsight throughput to Azure Blob Storage blog post.
Amazon S3
If you are planning to copy large amounts of data between HDFS and S3, you can
accelerate the process by passing -D fs.s3a.fast.upload=true
while invoking
DistCp. For example:
hadoop distcp -D fs.s3a.fast.upload=true s3a://dominika-test/driver-data /tmp/test2
The fs.s3a.fast.upload
option significantly accelerates data upload by
writing the data in blocks, possibly in parallel.
For more tips on how to improve performance for DistCp with S3, refer to Configuring and Tuning S3A Fast Upload.