Connectivity Problems
You may encounter the following S3 connectivity issues.
Unable to Execute HTTP Request: Read Timed Out
A read timeout means that the S3A client could not talk to the S3 service, and eventually gave up trying:
Unable to execute HTTP request: Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.createTunnelToTarget(DefaultRequestDirector.java:902) at org.apache.http.impl.client.DefaultRequestDirector.establishRoute(DefaultRequestDirector.java:821) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:647) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
This is not uncommon in Hadoop client applications — there is a whole wiki entry dedicated to possible causes of the error.
For S3 connections, key causes are:
The S3 endpoint property
fs.s3a.endpoint
for the target bucket is invalid.There's a proxy setting for the S3 client, and the proxy is not reachable or is on a different port.
The caller is on a host with fundamental connectivity problems. If a VM is on EC2, consider releasing it and requesting a new one.
Bad Request Exception When Working with S3 Frankfurt, Seoul, or Elsewhere
S3 Frankfurt and Seoul only support the V4 authentication API. Consequently, any requests using the V2 API will be
rejected with 400 Bad Request
:
$ bin/hadoop fs -ls s3a://frankfurt/ WARN s3a.S3AFileSystem:Client: Amazon S3 error 400: 400 Bad Request; Bad Request (retryable) com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request; Request ID:923C5D9E75E44C06), S3 Extended Request ID: HDwje6k+ANEeDsM6aJ8+D5gUmNAMguOk2BvZ8PH3g9z0gpH+IuwT7N19oQOnIr5CIx7Vqb/uThE= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:307) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:284) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2793) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103) at org.apache.hadoop.fs.shell.Command.run(Command.java:165) at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:373) ls: doesBucketExist on frankfurt-new: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code:400; Error Code:400 Bad Request;
This happens when you are trying to work with any S3 service which only supports the "V4" signing API — and the client is configured to use the default S3A service endpoint.
To avoid this error, set the specific endpoint to use via the
fs.s3a.endpoint
property. For more information, refer to Configuring Per-Bucket Settings to Access Data
Around the World.
Error Message "The bucket you are attempting to access must be addressed using the specified endpoint"
This surfaces when fs.s3a.endpoint
is configured to use S3 service
endpoint which is neither the original AWS one (s3.amazonaws.com
) nor the one
where the bucket is hosted.
org.apache.hadoop.fs.s3a.AWSS3IOException: purging multipart uploads on landsat-pds: com.amazonaws.services.s3.model.AmazonS3Exception: The bucket you are attempting to access must be addressed usingthe specified endpoint. Please send all future requests to this endpoint. (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect; Request ID: 5B7A5D18BE596E4B), S3 Extended Request ID: uE4pbbmpxi8Nh7rycS6GfIEi9UH/SWmJfGtM9IeKvRyBPZp/hN7DbPyz272eynz3PEMM2azlhjE=: at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3738) at com.amazonaws.services.s3.AmazonS3Client.listMultipartUploads(AmazonS3Client.java:2796) at com.amazonaws.services.s3.transfer.TransferManager.abortMultipartUploads(TransferManager.java:1217) at org.apache.hadoop.fs.s3a.S3AFileSystem.initMultipartUploads(S3AFileSystem.java:454) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:289) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2715) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:96) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2749) at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2737) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:430)
To resolve the issue, use the specific endpoint of the bucket's S3 service. Using the explicit endpoint for the region is recommended for speed and the ability to use the V4 signing API.
If not using "V4" authentication, you can use the original S3 endpoint:
<property> <name>fs.s3a.endpoint</name> <value>s3.amazonaws.com</value> </property>