Accessing data stored in Amazon S3 through Spark
 To access data stored in Amazon S3 from Spark applications, use Hadoop
      file APIs (SparkContext.hadoopFile,
        JavaHadoopRDD.saveAsHadoopFile,
        SparkContext.newAPIHadoopRDD, and
        JavaHadoopRDD.saveAsNewAPIHadoopFile) for reading and
      writing RDDs, providing URLs of the form
          s3a://bucket_name/path/to/file.
      You can read and write Spark SQL DataFrames using the Data Source API. 
Make sure that your environment is configured to allow
      access to the buckets you need. You must also configure the
        spark.yarn.access.hadoopFileSystems parameter to
      include the buckets you need to access. You can do this using the Spark
      client configuration, or at runtime as a command line parameter.
For example:
- Client configuration
            (/etc/spark/conf/spark-defaults.conf)
- 
          spark.yarn.access.hadoopFileSystems=s3a://bucket1,s3a://bucket2
- spark-shell
- 
          spark-shell --conf "spark.yarn.access.hadoopFileSystems=s3a://bucket1,s3a://bucket2" ...
- spark-submit
- 
          spark-submit --conf "spark.yarn.access.hadoopFileSystems=s3a://bucket1,s3a://bucket2" ...
