Errors Related to Visible S3A Inconsistency
S3 is an eventually consistent object store. That is, it is not a filesystem. It offers read-after-create consistency, which means that a newly created file is immediately visible. Except, there is a small quirk: a negative GET may be cached, such that even if an object is immediately created, the fact that there "wasn't" an object is still remembered.
That means the following sequence on its own will be consistent:
touch(path) -> getFileStatus(path)
But this sequence may be inconsistent:
getFileStatus(path) -> touch(path) -> getFileStatus(path)
A common source of visible inconsistencies is that the S3 metadata database — the part
of S3 which serves list requests — is updated asynchronously. Newly added or deleted files
may not be visible in the index, even though direct operations on the object
(HEAD
and GET
) succeed.
In S3A, that means that the getFileStatus()
and open()
operations are more likely to be consistent with the state of the object store than any
directory list operations (listStatus()
, listFiles()
,
listLocatedStatus()
, listStatusIterator()
).
The following errors may be related to eventual consistency of S3.
FileNotFoundException
, Even Though the File Was Just
Written
This can be a sign of consistency problems. It may also surface if there is some
asynchronous file write operation still in progress in the client: the operation has
returned, but the write has not yet completed. While the S3A client code does block during
the close()
operation, we suspect that asynchronous writes may be taking place
somewhere in the stack; This could explain why parallel tests fail more often than
serialized tests.
File Not Found in a Directory Listing, Even Though
getFileStatus()
Finds It
File was not found in a directory listing, even though getFileStatus()
finds it — or a deleted file is found in listing, even though getFileStatus()
reports that it is not there.
This is a visible sign of updates to the metadata server, which is lagging behind the state of the underlying filesystem.
File Not Visible/Saved
The files in an object store are not visible until the write has been completed.
In-progress writes are simply saved to a local file/cached in RAM and only uploaded at the
end of a write operation. If a process terminated unexpectedly, or failed to call the
close()
method on an output stream, the pending data will have been
lost.
File flush()
and
hflush()
Calls Do Not Save Data to
S3A
Again, this is due to the fact that the data is cached locally until the
close()
operation. The S3A filesystem cannot be used as a store of data if it
is required that the data is persisted durably after everyflush()/hflush()
call. This includes resilient logging, HBase-style journaling and the like. The standard
strategy here is to save to HDFS and then copy to S3.