Accessing Cloud Data
Also available as:
PDF
loading table of contents...

Using S3 as a Safe and Fast Destination of Work

Amazon S3 is an eventually consistent fileystem, which makes listings unreliable. It also lacks a rename() operation which makes the performance of committing work very slow. To address these issues, the S3A connector has two features

  • S3Guard: for consistent directory listings.

  • S3A Committers: For high-performance committing of the output of Spark queries to S3.

Without these, it is trying to use S3 destination of work is slow and potentially unsafe.