2. Running Compression with Hive Queries

 2.1. Create LZO Files

  1. Create LZO files as the output of the Hive query.

  2. Use lzo command utility or your custom Java to generate .lzo.index for the .lzo files.

Hive Query Parameters

Prefix the query string with these parameters:

SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true

For example:

hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec;SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true;"

 2.2. Write Custom Java to Create LZO Files

  1. Create text files as the output of the Hive query.

  2. Write custom Java code to

    • convert Hive query generated text files to .lzo files

    • generate .lzo.index files for the .lzo files generated above

Hive Query Parameters

Prefix the query string with these parameters:

SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false

For example:

hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"

loading table of contents...