Configuring Services to Use LZO Compression

After you install the GPL Extras parcel, reconfigure and restart services that need to use LZO functionality. Any service that does not require the use of LZO need not be configured.

HDFS and MapReduce

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Search for the io.compression.codecs property.
  4. In the Compression Codecs property, click in the field, then click the + sign to open a new value field.
  5. Add the following two codecs:
    • com.hadoop.compression.lzo.LzoCodec
    • com.hadoop.compression.lzo.LzopCodec
  6. Save your configuration changes.
  7. Restart HDFS.
  8. Redeploy the HDFS client configuration.

Oozie

  1. Go to /var/lib/oozie on each Oozie server and even if the LZO JAR is present, symlink the Hadoop LZO JAR:
    /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
  2. Restart Oozie.

HBase

Restart HBase.

Impala

Restart Impala.

Hive

Restart the Hive server.

Hive-on-Tez

  1. install the GPL Extras parcel.
  2. Log in to Cloudera Manager and go to the Tez service.
  3. Select the Configuration tab.
  4. Add the following value to the Tez Additional Classpath configuration parameter:
    /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
  5. Append the following to the Tez Application Master Environment Settings configuration parameter:
    :/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
  6. Append the following to the Tez Task Environment Settings configuration parameter:
    :/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
  7. Deploy the Client Configuration (Actions > Deploy Client Configuration).
  8. Go to the Hive on Tez service.
  9. Restart the Hive on Tez Service (Actions > Restart).
  10. Before issuing a query, go to the Tez service Configuration tab and change the value of the Codec for Compressing Intermediate Data to:
    com.hadoop.compression.lzo.LzoCodec

Sqoop 1

  1. Add the following entries to the Sqoop 1 Client Client Advanced Configuration Snippet (Safety Valve)
    • HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/
    • JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
  2. Re-deploy the client configuration.