Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Create a Hadoop archive

Use the hadoop archive command to invoke the Hadoop archiving tool.

Run the hadoop archive command by specifying the archive name to create, the parent directory relative to the archive location, the source files to archive, and the destination archive location.
hadoop archive -archiveName name -p <parent> <src>* <dest>

The archive name must have a .har extension

Note
Note
  • Archiving does not delete the source files. If you want to delete the input files after creating an archive to reduce namespace, you must manually delete the source files.
  • Although the hadoop archive command can be run from the host file system, the archive file is created in the HDFS file system from directories that exist in HDFS. If you reference a directory on the host file system and not HDFS, the system displays the following error:
    The resolved paths set is empty. Please check whether the srcPaths exist, where srcPaths
     = [</directory/path>] 
Consider the following example of archiving two files:
hadoop archive -archiveName foo.har -p /user/hadoop dir1 dir2 /user/zoo

This example creates an archive using /user/hadoop as the relative archive directory. The directories /user/hadoop/dir1 and /user/hadoop/dir2 will be archived in the /user/zoo/foo.har archive.