Chapter 4. Configuring HDFS Compression

This section describes how to configure HDFS compression on Linux.

Linux supports GzipCodec, DefaultCodec, BZip2Codec, LzoCodec, and SnappyCodec. Typically, GzipCodec is used for HDFS compression. Use the following instructions to use GZipCodec.

  • Option I: To use GzipCodec with a one-time-only job:

    hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar sort sbr"-Dmapred.compress.map.output=true" sbr"-Dmapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr "-Dmapred.output.compress=true" sbr"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr -outKey org.apache.hadoop.io.Textsbr -outValue org.apache.hadoop.io.Text input output 
  • Option II: To enable GzipCodec as the default compression:

    • Edit the core-site.xml file on the NameNode host machine:

      <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
         org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
         org.apache.hadoop.io.compress.SnappyCodec</value>
        <description>A list of the compression codec classes that can be used
         for compression/decompression.</description>
      </property>
    • Edit the mapred-site.xml file on the JobTracker host machine:

      <property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
      </property> 
       
      <property> 
        <name>mapred.map.output.compression.codec</name>
        <value>org.apache.hadoop.io.compress.GzipCodec</value> 
      </property> 
       
      <property> 
        <name>mapred.output.compression.type</name> 
        <value>BLOCK</value>
      </property> 
    • (Optional) Enable the following two configuration parameters to enable job output compression. Edit the mapred-site.xml file on the Resource Manager host machine:

      <property> 
        <name>mapred.output.compress</name>
        <value>true</value> 
      </property> 
      
      <property> 
        <name>mapred.output.compression.codec</name>
        <value>org.apache.hadoop.io.compress.GzipCodec</value> 
      </property> 
    • Restart the cluster using the applicable commands in the Controlling HDP Services Manually section of the HDP Reference Guide.