Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Erasure coding CLI command

Use the hdfs ec command to set erasure coding policies on directories.


hdfs ec [generic options]
     [-setPolicy -path <path> [-policy <policyName>] [-replicate]]
     [-getPolicy -path <path>]
     [-unsetPolicy -path <path>]
     [-listPolicies]
     [-addPolicies -policyFile <file>]
     [-listCodecs]
     [-removePolicy -policy <policyName>]
     [-enablePolicy -policy <policyName>]
     [-disablePolicy -policy <policyName>]
     [-help [cmd ...]]

Options:

  • [-setPolicy [-p <policyName>] <path>]: Sets an EC policy on a directory at the specified path. The following EC policies are supported: RS-3-2-1024k (Reed-Solomon with 3 data blocks, 2 parity blocks and 1024 KB cell size), RS-6-3-1024k, RS-LEGACY-6-3-1024k, and XOR-2-1-1024k.

    <path>: A directory in HDFS. This is a mandatory parameter. Setting a policy only affects newly created files, and does not affect existing files.

    <policyName>: The EC policy to be used for files under the specified directory. This is an optional parameter, specified using the -p flag. If no policy is specified, the system default Erasure Coding policy is used. The default policy is RS-6-3-1024k.

  • -replicate: Forces a directory to use the default 3x replication scheme.
    Note
    Note
    You cannot specify -replicate and -policy <policyName> at the same time. Both the arguments are optional.
  • -getPolicy -path <path>: Gets details of the EC policy of a file or directory for the specified path.
  • [-unsetPolicy -path <path>]: Removes an EC policy already set by a setPolicy on a directory. This option does not work on a directory that inherits the EC policy from a parent directory. If you run this option on a directory that does not have an explicit policy set, no error is returned.
  • [-addPolicies -policyFile <file>]: Adds a list of EC policies. HDFS allows you to add 64 policies in total, with the policy ID in the range of 64 to 127. Adding policies fails if there are already 64 policies.
  • [-listCodecs]: Lists all the supported EC erasure coding codecs and coders in the system.
  • [-removePolicy -policy <policyName>]: Removes an EC policy.
  • [-enablePolicy -policy <policyName>]: Enables an EC policy.
  • [-disablePolicy -policy <policyName>]: Disables an EC policy.
  • [-help]: Displays help for a given command, or for all commands if none is specified.

Erasure coding background recovery work on the DataNodes can be tuned using the following configuration parameters in hdfs-site.xml.

  • dfs.datanode.ec.reconstruction.stripedread.timeout.millis: Timeout for striped reads. Default value is 5000 ms.

  • dfs.datanode.ec.reconstruction.threads: Number of threads used by the DataNode for the background recovery task. The default value is 8 threads.

  • dfs.datanode.ec.reconstruction.stripedread.buffer.size: Buffer size for reader service. Default value is 64 KB.

  • dfs.datanode.ec.reconstruction.xmits.weight: The relative weight of xmits used by the EC background recovery task when compared to replicated block recovery. The default value is 0.5.

    If the parameter is set to 0, the EC task always has one xmit. The xmits of an erasure coding recovery task are calculated as the maximum value between the number of read streams and the number of write streams. For example, if an EC recovery task needs to read from six nodes and write to two nodes, the xmit value is max(6, 2) * 0.5 = 3.