Hadoop Archives and MapReduce
To use Hadoop Archives with MapReduce, you must reference files slightly
differently than with the default file system. If you have a Hadoop Archive stored
in HDFS in /user/ zoo/foo.har
, you must specify the input
directory as har:///user/zoo/foo.har
to use it as a MapReduce
input. Since Hadoop Archives are exposed as a file system, MapReduce is able to use
all of the logical input files in Hadoop Archives as input.