This section describes how to configure HDFS compression on Linux.
Linux supports GzipCodec
,
DefaultCodec
, BZip2Codec
,
LzoCodec
, and SnappyCodec
. Typically,
GzipCodec
is used for HDFS compression. Use the following
instructions to use GZipCodec
.
Option I: To use
GzipCodec
with a one-time-only job:hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar sort sbr"-Dmapred.compress.map.output=true" sbr"-Dmapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr "-Dmapred.output.compress=true" sbr"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr -outKey org.apache.hadoop.io.Textsbr -outValue org.apache.hadoop.io.Text input output
Option II: To enable
GzipCodec
as the default compression:Edit the
core-site.xml
file on the NameNode host machine:<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
Edit the
mapred-site.xml
file on the JobTracker host machine:<property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property>
(Optional) Enable the following two configuration parameters to enable job output compression. Edit the
mapred-site.xml
file on the Resource Manager host machine:<property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Restart the cluster using the applicable commands in the Controlling HDP Services Manually section of the HDP Reference Guide.