Creating a Hadoop archive

Hadoop Archives can be created using the Hadoop archiving tool. The archiving tool uses MapReduce to efficiently create Hadoop Archives in parallel. Use the "hadoop archive" command to invoke the Hadoop archiving tool.

Run the hadoop archive command by specifying the archive name to create, the parent directory relative to the archive location, the source files to archive, and the destination archive location.
```
hadoop archive -archiveName name -p <parent> <src>* <dest>
```
The archive name must have a .har extension
note
- Archiving does not delete the source files. If you want to delete the input files after creating an archive to reduce namespace, you must manually delete the source files.
- Although the hadoop archive command can be run from the host file system, the archive file is created in the HDFS file system from directories that exist in HDFS. If you reference a directory on the host file system and not HDFS, the system displays the following error:
  The resolved paths set is empty. Please check whether the srcPaths exist, where srcPaths = [</directory/path>]
Optional: Use the -Dhar.partfile.size=[***enter part file size***] parameter to configure the part file size based on your requirements.

hadoop archive -Dhar.partfile.size=2147483648 -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo

This example creates an archive using /user/hadoop as the relative archive directory. The directories /user/hadoop/dir1 and /user/hadoop/dir2 are archived in the /user/zoo/foo.har archive and a part file of 2 GB is created.