Chapter 6. Using Spark with HDFS

Specifying Compression

To specify compression in Spark-shell when writing to HDFS, use code similar to:

rdd.saveAsHadoopFile("/tmp/spark_compressed",

"org.apache.hadoop.mapred.TextOutputFormat",

compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")