HDFS Administration
Also available as:
PDF

Configuring Archival Storage

Use the following steps to configure archival storage:

  1. Shut down the DataNode, using the applicable commands in Controlling HDP Services Manually.

  2. Assign the ARCHIVE Storage Type to the DataNode.

    You can use the dfs.datanode.data.dir property in the/etc/hadoop/conf/hdfs-site.xml file to assign the ARCHIVE storage type to a DataNode.

    The dfs.datanode.data.dir property determines where on the local filesystem a DataNode should store its blocks.

    If you specify a comma-delimited list of directories, data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. You can specify that each directory resides on a different type of storage: DISK, SSD, ARCHIVE, or RAM_DISK.

    To specify a DataNode as DISK storage, specify [DISK] and a local file system path. For example:

    <property>
      <name>dfs.datanode.data.dir</name>
      <value>[DISK]/grid/1/tmp/data_trunk</value>
    </property>

    To specify a DataNode as ARCHIVE storage, insert [ARCHIVE] at the beginning of the local file system path. For example:

    <property>
      <name>dfs.datanode.data.dir</name>
      <value>[ARCHIVE]/grid/1/tmp/data_trunk</value>
    </property>
  3. Set or Get Storage Policies. To set a storage policy on a file or a directory:

    hdfs storagepolicies -setStoragePolicy <path> <policyName>

    Arguments:

    Table 2.1. Setting Storage Policy

    ArgumentDescription
    <path>The path to a directory or file.
    <policyName>The name of the storage policy.

    Example:

    hdfs storagepolicies -setStoragePolicy /cold1 COLD

    To get the storage policy of a file or a directory:

    hdfs storagepolicies -getStoragePolicy <path>

    Argument:

    Table 2.2. Getting Storage Policy

    ArgumentDescription
    <path>The path to a directory or file.

    Example:

    hdfs storagepolicies -getStoragePolicy /cold1 
  4. Start the DataNode, using the applicable commands in Controlling HDP Services Manually.

  5. Use Mover to Apply Storage Policies:

    When you update a storage policy setting on a file or directory, the new policy is not automatically enforced. You must use the HDFS mover data migration tool to actually move blocks as specified by the new storage policy.

    The mover data migration tool scans the specified files in HDFS and checks to see if the block placement satisfies the storage policy. For the blocks that violate the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirements.

    Command:

    hdfs mover [-p <files/dirs> | -f <local file name>] 

    Arguments:

    Table 2.3. HDFS Mover Arguments

    ArgumentsDescription
    -p <files/dirs>Specify a space-separated list of HDFS files/directories to migrate.
    -f <local file>Specify a local file containing a list of HDFS files/directories to migrate.

    [Note]Note

    Note that when both -p and -f options are omitted, the default path is the root directory.