Configuring Per-Bucket Settings to Access Data Around the World
S3 buckets are hosted in different AWS regions, the default being "US-East". The
S3A client talks to this region by default, issuing HTTP requests to the server
s3.amazonaws.com
. This central endpoint can be used for accessing any
bucket in any region which supports using the V2 Authentication API, albeit possibly at a
reduced performance.
Each region has its own S3 endpoint, documented by Amazon. The S3A client supports these endpoints. While it is generally simpler to use the default endpoint, direct connections to specific regions (i.e. connections via region's own endpoint) may deliver performance and availability improvements, and are mandatory when working with the most recently deployed regions, such as Frankfurt and Seoul. You can use fs.s3a.endpoint.region to explicitly set the region for the bucket.
When deciding which endpoint to use, consider the following:
-
Applications running in EC2 infrastructure do not pay for data transfers to or from local S3 buckets. In contrast, they will be billed for access to remote buckets. Therefore, wherever possible, always use local buckets and local copies of data.
-
When the V1 request signing protocol is used, the default S3 endpoint can support data transfer with any bucket.
-
When the V4 request signing protocol is used, AWS requires the explicit region endpoint to be used — hence S3A must be configured to use the specific endpoint. This is done in the configuration option
fs.s3a.endpoint
. -
All endpoints other than the default endpoint only support interaction with buckets local to that S3 instance.
If the wrong endpoint is used, the request may fail. This may be reported as a 301 redirect error, or as a 400 Bad Request. Take these failures as cues to check the endpoint setting of a bucket.
The up to date list of endpoints is provided by Amazon: https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
Example
The following examples show per-bucket endpoints set for the "landsat-pds" and "eu-dataset" buckets, with the endpoints set to the default central endpoint and EU/Ireland, respectively:
<property>
<name>fs.s3a.bucket.landsat-pds.endpoint</name>
<value>s3.amazonaws.com</value>
<description>The endpoint for s3a://landsat-pds URLs</description>
</property>
<property>
<name>fs.s3a.bucket.eu-dataset.endpoint</name>
<value>s3-eu-west-1.amazonaws.com</value>
<description>The endpoint for s3a://eu-dataset URLs</description>
</property>
The following example shows how to set the region for a bucket as ap-south-1:
<property>
<name>fs.s3a.endpoint.region</name>
<value>ap-south-1</value>
<description>AWS S3 region for a bucket, which bypasses the parsing of fs.s3a.endpoint to know the region. This helps in avoiding errors while using privateLink URL and explicitly setting the bucket region.
</description>
</property>
Explicitly declaring a bucket bound to the central endpoint ensures that if the default endpoint is changed to a new region, data stored in US-east is still reachable.