Configuring Per-Bucket Settings to Access Data Around the World
S3 buckets are hosted in different AWS regions, the default being "US-East". The S3A
client talks to this region by default, issuing HTTP requests to the server
s3.amazonaws.com
. This central endpoint
can be used for accessing any bucket in any region which supports using the V2
Authentication API, albeit possibly at a reduced performance.
Each region has its own S3 endpoint, documented by Amazon. The S3A client supports these endpoints. While it is generally simpler to use the default endpoint, direct connections to specific regions (i.e. connections via region's own endpoint) may deliver performance and availability improvements, and are mandatory when working with the most recently deployed regions, such as Frankfurt and Seoul.
When deciding which endpoint to use, consider the following:
Applications running in EC2 infrastructure do not pay for data transfers to or from local S3 buckets. In contrast, they will be billed for access to remote buckets. Therefore, wherever possible, always use local buckets and local copies of data.
When the V1 request signing protocol is used, the default S3 endpoint can support data transfer with any bucket.
When the V4 request signing protocol is used, AWS requires the explicit region endpoint to be used — hence S3A must be configured to use the specific endpoint. This is done in the configuration option
fs.s3a.endpoint
.All endpoints other than the default endpoint only support interaction with buckets local to that S3 instance.
If the wrong endpoint is used, the request may fail. This may be reported as a 301 redirect error, or as a 400 Bad Request. Take these failures as cues to check the endpoint setting of a bucket.
Here is a list of properties defining all Amazon S3 regions, as of March 2017.
These parameters can be used to specify endpoints for individual buckets. You can add
these properties to your
core-site.xml
:
<!-- This is the default endpoint, which can be used to interact with any v2 region. --> <property> <name>central.endpoint</name> <value>s3.amazonaws.com</value> </property> <property> <name>canada.endpoint</name> <value>s3.ca-central-1.amazonaws.com</value> </property> <property> <name>frankfurt.endpoint</name> <value>s3.eu-central-1.amazonaws.com</value> </property> <property> <name>ireland.endpoint</name> <value>s3-eu-west-1.amazonaws.com</value> </property> <property> <name>london.endpoint</name> <value>s3.eu-west-2.amazonaws.com</value> </property> <property> <name>mumbai.endpoint</name> <value>s3.ap-south-1.amazonaws.com</value> </property> <property> <name>ohio.endpoint</name> <value>s3.us-east-2.amazonaws.com</value> </property> <property> <name>oregon.endpoint</name> <value>s3-us-west-2.amazonaws.com</value> </property> <property> <name>sao-paolo.endpoint</name> <value>s3-sa-east-1.amazonaws.com</value> </property> <property> <name>seoul.endpoint</name> <value>s3.ap-northeast-2.amazonaws.com</value> </property> <property> <name>singapore.endpoint</name> <value>s3-ap-southeast-1.amazonaws.com</value> </property> <property> <name>sydney.endpoint</name> <value>s3-ap-southeast-2.amazonaws.com</value> </property> <property> <name>tokyo.endpoint</name> <value>s3-ap-northeast-1.amazonaws.com</value> </property> <property> <name>virginia.endpoint</name> <value>${central.endpoint}</value> </property>
The list above can be used to specify the endpoint of individual buckets. If you add
these to yourcore-site.xml
, you can then define per-bucket endpoints.
Example
The following examples show per-bucket endpoints set for the "landsat-pds" and "eu-dataset" buckets, with the endpoints set to central and EU/Ireland, respectively:
<property> <name>fs.s3a.bucket.landsat-pds.endpoint</name> <value>${central.endpoint}</value> <description>The endpoint for s3a://landsat-pds URLs</description> </property> <property> <name>fs.s3a.bucket.eu-dataset.endpoint</name> <value>${ireland.endpoint}</value> <description>The endpoint for s3a://eu-dataset URLs</description> </property>
Explicitly declaring a bucket bound to the central endpoint ensures that if the default endpoint is changed to a new region, data stored in US-east is still reachable.