Configuring Services to Use LZO Compression
Minimum Required Role: Configurator (also provided by Cluster Administrator, Limited Cluster Administrator , and Full Administrator)
After you install the GPL Extras parcel, reconfigure and restart services that need to use LZO functionality. Any service that does not require the use of LZO need not be configured.
HDFS and MapReduce
- Go to the HDFS service.
- Click the Configuration tab.
- Search for the
io.compression.codecs
property. - In the Compression Codecs property, click in the field, then click the + sign to open a new value field.
- Add the following two codecs:
- com.hadoop.compression.lzo.LzoCodec
- com.hadoop.compression.lzo.LzopCodec
- Save your configuration changes.
- Restart HDFS.
- Redeploy the HDFS client configuration.
Oozie
- Go to /var/lib/oozie on each Oozie server and
even if the LZO JAR is present, symlink the Hadoop LZO
JAR:
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
- Restart Oozie.
HBase
Restart HBase.
Impala
Restart Impala.
Hive
Restart the Hive server.
Hive-on-Tez
- install the GPL Extras parcel.
- Log in to Cloudera Manager and go to the Tez service.
- Select the Configuration tab.
- Add the following value to the Tez Additional
Classpath configuration
parameter:
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
- Append the following to the Tez Application Master
Environment Settings configuration
parameter:
:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
- Append the following to the Tez Task Environment
Settings configuration
parameter:
:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
- Deploy the Client Configuration ( .
- Go to the Hive on Tez service.
- Restart the Hive on Tez Service ( ).
- Before issuing a query, go to the Tez
service Configuration tab and change the
value of the Codec for Compressing Intermediate
Data
to:
com.hadoop.compression.lzo.LzoCodec
Sqoop 1
- Add the following entries to the Sqoop 1 Client Client Advanced Configuration Snippet (Safety Valve)
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
- Re-deploy the client configuration.