Erasure Coding Overview

The Ozone EC feature provides data durability and fault-tolerance along with reduced storage space and ensures data durability similar to the Ratis THREE replication approach.

Supported Engines

EC supports the following data processing engines: Impala, Hive and so on.

Before you begin

When you use EC as the data durability scheme, recovery happens in the background and requires no direct input from a user. But you must be aware of the following highlights to use EC.

  • Note the limitations for EC.
  • Verify that the clusters run CDP 7.1.8 or higher.
  • Determine which EC policy you want to use: See Understanding Erasure Coding Policies for more information.
  • Determine if you want to use EC for existing data or new data. If you want to use EC for existing data, you need to replicate that data with distcp.
  • Verify that your cluster setup meets the rack and node requirements described in Best Practices for Rack and Node Setup for EC.

Limitations

  • Certain operations are not supported on erasure coded files due to substantial technical challenges.

  • Querying an erasure coded data might have an impact on the performance. If you notice performance degradation you can troubleshoot it by running the Query profile command. It will include erasure coding related information in the profile. This will help you to quickly identify the cause.