Configuring Apache SparkPDF version

Accessing compressed files in Spark

You can read compressed files using one of the following methods:
  • textFile(path)
  • hadoopFile(path,outputFormatClass)
You can save compressed files using one of the following methods:
  • saveAsTextFile(path, compressionCodecClass="codec_class")
  • saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
where codec_class is one of these classes:
  • gzip - org.apache.hadoop.io.compress.GzipCodec
  • bzip2 - org.apache.hadoop.io.compress.BZip2Codec
  • LZO - com.hadoop.compression.lzo.LzopCodec
  • Snappy - org.apache.hadoop.io.compress.SnappyCodec
  • Deflate - org.apache.hadoop.io.compress.DeflateCodec

For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.

We want your opinion

How can we improve this page?

What kind of feedback do you have?