Accessing compressed files in Spark

You can read compressed files using one of the following methods:
  • textFile(path)
  • hadoopFile(path,outputFormatClass)
You can save compressed files using one of the following methods:
  • saveAsTextFile(path, compressionCodecClass="codec_class")
  • saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
where codec_class is one of these classes:
  • gzip - org.apache.hadoop.io.compress.GzipCodec
  • bzip2 - org.apache.hadoop.io.compress.BZip2Codec
  • LZO - com.hadoop.compression.lzo.LzopCodec
  • Snappy - org.apache.hadoop.io.compress.SnappyCodec
  • Deflate - org.apache.hadoop.io.compress.DeflateCodec

For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.