Using DistCp with S3
When using DistCp with data in S3, consider the following limitations:
The
-append
option is not supported.The
-diff
option is not supported.The
-atomic
option causes a rename of the temporary data, so significantly increases the time to commit work at the end of the operation. Furthermore, as S3A does not offer atomic renames of directories, the-atomic
operation doesn't actually deliver what is promised. Avoid using this option.All
-p
options, including those to preserve permissions, user and group information, attributes checksums, and replication are ignored.CRC checking between HDFS and S3 will not be performed. We do still recommend using the
-skipcrccheck
option to make clear that this is taking place, and so that if etag checksums are enabled on S3A through the propertyfs.s3a.etag.checksum.enabled
, then DistCp between HDFS and S3 will not not trigger checksum-mismatch errors.