Creating a Hadoop archive

Hadoop Archives can be created using the Hadoop archiving tool. The archiving tool uses MapReduce to efficiently create Hadoop Archives in parallel. Use the "hadoop archive" command to invoke the Hadoop archiving tool.

  1. Run the hadoop archive command by specifying the archive name to create, the parent directory relative to the archive location, the source files to archive, and the destination archive location.
    hadoop archive -archiveName name -p <parent> <src>* <dest>

    The archive name must have a .har extension

  2. Optional: Use the -Dhar.partfile.size=[***enter part file size***] parameter to configure the part file size based on your requirements.
hadoop archive -Dhar.partfile.size=2147483648 -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo

This example creates an archive using /user/hadoop as the relative archive directory. The directories /user/hadoop/dir1 and /user/hadoop/dir2 are archived in the /user/zoo/foo.har archive and a part file of 2 GB is created.