Accessing external storage from Spark
Spark can access all storage sources supported by Hadoop, including a local file system, HDFS, HBase, Amazon S3, and Microsoft ADLS.
Spark supports many file types, including text files,
RCFile
, SequenceFile
, Hadoop
InputFormat
, Avro, Parquet, and compression of all
supported files.
For developer information about working with external storage, see External Datasets in the upstream Apache Spark RDD Programming Guide.