Classpath Related Errors
The Hadoop S3 filesystem clients need the Hadoop-specific filesystem clients and third party S3 client libraries to be compatible with the Hadoop code, and any dependent libraries to be compatible with Hadoop and the specific JVM.
The classpath must be set up for the process talking to S3. If this is code running in
the Hadoop cluster, then the JARs must be on that classpath. This includes
distcp
.
ClassNotFoundException Errors
ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem ClassNotFoundException: org.apache.hadoop.fs.s3native.NativeS3FileSystem ClassNotFoundException: org.apache.hadoop.fs.s3.S3FileSystem
These are the Hadoop classes, found in the hadoop-aws
JAR. An exception
reporting that one of these classes is missing means that this JAR is not on the
classpath.
Similarly, this error
ClassNotFoundException: com.amazonaws.services.s3.AmazonS3Client
or similar errors related to another com.amazonaws
class mean that one or
more of the aws-*-sdk
JARs are missing.
To solve the issue, add the missing JARs to the classpath.
Missing Method in com.amazonaws Class
This can be triggered by incompatibilities between the AWS SDK on the classpath and the version with which Hadoop was compiled.
The AWS SDK JARs change their signature between releases often, so the only way to safely update the AWS SDK version is to recompile Hadoop against the later version.
There is nothing the Hadoop team can do here; if you get this problem, then you are on your own. The Hadoop developer team did look at using reflection to bind to the SDK, but there were too many changes between versions for this to work reliably. All it did was postpone version compatibility problems until the specific codepaths were executed at runtime. This was actually a backward step in terms of fast detection of compatibility problems.
Missing Method in a Jackson Class
This is usually caused by version mismatches between Jackson JARs on the classpath. All Jackson JARs on the classpath must be of the same version.