Erasure coding policies
To accommodate heterogeneous workloads, files and directories in an HDFS cluster are allowed to have different replication and EC policies.
Each policy is defined by the following 2 pieces of information:
- The EC Schema: Includes the numbers of data and parity blocks in an EC group (e.g., 6+3), as well as the codec algorithm (for example, Reed-Solomon).
- The size of a striping cell: Determines the granularity of striped reads and writes, including buffer sizes and encoding work.
HDP supports the
Reed-Solomon Erasure Coding algorithm. The system default scheme is Reed-Solomon with 6
data blocks, 3 parity blocks, and a 1024 KB cell size (RS-6-3-1024k
).
In addition, the following
policies are supported: RS-3-2-1024k
(Reed-Solomon with 3 data blocks, 2
parity blocks and 1024 KB cell size), RS-LEGACY-6-3-1024k
, and
XOR-2-1-1024k
.