Accessing compressed files in Spark
You can read compressed files using one of the following methods:
-
textFile(path)
-
hadoopFile(path,outputFormatClass)
You can save compressed files using one of the following methods:
-
saveAsTextFile(path, compressionCodecClass="codec_class")
-
saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
- gzip -
org.apache.hadoop.io.compress.GzipCodec
- bzip2 -
org.apache.hadoop.io.compress.BZip2Codec
- LZO -
com.hadoop.compression.lzo.LzopCodec
- Snappy -
org.apache.hadoop.io.compress.SnappyCodec
- Deflate -
org.apache.hadoop.io.compress.DeflateCodec
For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.