Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Format for using Hadoop archives with MapReduce

To use Hadoop Archives with MapReduce, you must reference files differently than you would with the default file system. If you have a Hadoop Archive stored in HDFS in /user/zoo/foo.har, you must specify the input directory as har:///user/zoo/foo.har to use it as a MapReduce input.

Because Hadoop Archives are exposed as a file system, MapReduce can use all of the logical input files in Hadoop Archives as input.