Configuring and Tuning S3A Block Upload
Because of the nature of the S3 object store, data written to an S3A
OutputStream
is not written incrementally — instead, by default, it is
buffered to disk until the stream is closed in its close()
method. This can
make output slow because the execution time for OutputStream.close()
is
proportional to the amount of data buffered and inversely proportional to the bandwidth
between the host to S3; that is O(data/bandwidth)
. Other work in the same
process, server, or network at the time of upload may increase the upload time.
In summary, the further the process is from the S3 store, or the smaller the EC2 VM is, the longer it will take complete the work. This can create problems in application code:
Code often assumes that the
close()
call is fast; the delays can create bottlenecks in operations.Very slow uploads sometimes cause applications to time out - generally, threads blocking during the upload stop reporting progress, triggering timeouts.
Streaming very large amounts of data may consume all disk space before the upload begins.