Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Configure archival storage

To configure archival storage for a DataNode, you must assign the ARCHIVE storage type to the DataNode, set storage policies, and move blocks that violate the storage policy to the appropriate storage type.

  1. Shut down the DataNode.
  2. Assign the ARCHIVE Storage Type to the DataNode.
    You can use the dfs.datanode.data.dir property in the /etc/hadoop/conf/hdfs-site.xml file to assign the ARCHIVE storage type to a DataNode.

    The dfs.datanode.data.dir property determines where on the local filesystem a DataNode should store its blocks.

    If you specify a comma-delimited list of directories, data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. You can specify that each directory resides on a different type of storage: DISK, SSD, ARCHIVE, or RAM_DISK.

    To specify a DataNode as DISK storage, specify [DISK] and a local file system path. For example:
    
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>[DISK]/grid/1/tmp/data_trunk</value>
    </property>
    To specify a DataNode as ARCHIVE storage, insert [ARCHIVE] at the beginning of the local file system path. For example:
    
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>[ARCHIVE]/grid/1/tmp/data_trunk</value>
    </property>
  3. Depending on your requirements, either set a storage policy or list the already applied storage policies on a specified file or directory.
    If you want to... Use this command...
    Set a storage policy hdfs storagepolicies -setStoragePolicy <path> <policyname>
    List storage policies hdfs storagepolicies -getStoragePolicy <path>
    Note
    Note
    When you update a storage policy setting on a file or directory, the new policy is not automatically enforced. You must use the HDFS mover data migration tool to actually move blocks as specified by the new storage policy.
  4. Start the DataNode.
  5. Use the HDFS mover tool according to the specified storage policy.
    The HDFS mover data migration tool scans the specified files in HDFS and verifies if the block placement satisfies the storage policy. For the blocks that violate the storage policy, the tool moves the replicas to a different storage type in order to fulfill the storage policy requirements.