Chapter 6. Using Spark with HDFS
Specifying Compression
To specify compression in Spark-shell when writing to HDFS, use code similar to:
rdd.saveAsHadoopFile("/tmp/spark_compressed",
"org.apache.hadoop.mapred.TextOutputFormat",
compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")