Enable GZipCodec as the default compression codec
For the MapReduce framework, update relevant properties in
core-site.xml
and mapred-site.xml
to enable
GZipCodec
as the default compression codec.
-
Edit the
core-site.xml
file on the NameNode host machine.<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo. LzoCodec,org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
-
Edit the
mapred-site.xml
file on the JobTracker host machine.<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
- Optional:
Enable the following two configuration parameters to enable job output compression.
Edit the
mapred-site.xml
file on the Resource Manager host machine.<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
- Restart the cluster.