Creating a Hadoop archive
Hadoop Archives can be created using the Hadoop archiving tool. The archiving tool uses MapReduce to efficiently create Hadoop Archives in parallel. Use the "hadoop archive" command to invoke the Hadoop archiving tool.
Run the hadoop archive command by specifying the archive name to
create, the parent directory relative to the archive location, the source files to
archive, and the destination archive location.
hadoop archive -archiveName name -p <parent> <src>* <dest>
The archive name must have a
- Optional: Use the -Dhar.partfile.size=[***enter part file size***] parameter to configure the part file size based on your requirements.
hadoop archive -Dhar.partfile.size=2147483648 -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo
This example creates an archive using
/user/hadoop as the relative
archive directory. The directories
/user/hadoop/dir2 are archived in the
/user/zoo/foo.har archive and a part file of 2 GB is created.