Accessing compressed files in Spark
You can read compressed files using one of the following methods:
-
textFile(path) -
hadoopFile(path,outputFormatClass)
You can save compressed files using one of the following methods:
-
saveAsTextFile(path, compressionCodecClass="codec_class") -
saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
- gzip -
org.apache.hadoop.io.compress.GzipCodec - bzip2 -
org.apache.hadoop.io.compress.BZip2Codec - LZO -
com.hadoop.compression.lzo.LzopCodec - Snappy -
org.apache.hadoop.io.compress.SnappyCodec - Deflate -
org.apache.hadoop.io.compress.DeflateCodec
For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.
