Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Enable GZipCodec as the default compression codec

For the MapReduce framework, update relevant properties in core-site.xml and mapred-site.xml to enable GZipCodec as the default compression codec.

  1. Edit the core-site.xml file on the NameNode host machine.
    
    <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,
        org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.
        LzoCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
      <description>A list of the compression codec classes that can be used
        for compression/decompression.</description>
    </property>
  2. Edit the mapred-site.xml file on the JobTracker host machine.
    
    <property>
      <name>mapreduce.map.output.compress</name>
      <value>true</value>
    </property>
    
    <property>  
      <name>mapreduce.map.output.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value> 
    </property> 
    
    <property> 
      <name>mapreduce.output.fileoutputformat.compress.type</name> 
      <value>BLOCK</value>
    </property>
  3. Optional: Enable the following two configuration parameters to enable job output compression. Edit the mapred-site.xml file on the Resource Manager host machine.
    
    <property> 
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>true</value> 
    </property> 
    
    <property> 
      <name>mapreduce.output.fileoutputformat.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value> 
    </property>
  4. Restart the cluster.