Using Ozone storage with Cloudera Data Warehouse Private Cloud

Apache Ozone is an object store available on the CDP Private Cloud Base cluster which enables you to optimize storage for big data workloads. You can query data residing on Ozone using Hive or Impala from Cloudera Data Warehouse (CDW) Data Service on Private Cloud.

Apache Ozone DataNodes support storage density up to 400 TB, unlike HDFS DataNodes which support storage density only up to 100 TB. Apart from the ability to scale to billions of objects or files of varying sizes, applications that use frameworks like Apache Spark, Impala, Apache YARN, and Apache Hive work natively on Ozone without any modifications.

Supported use cases

Ozone filesystem (OFS) is best suited for Hive and Impala in the following use cases:
  • To retain HDFS IO performance and other characteristics critical for big data use cases.
  • Recommended in an environment with dense nodes using up to 400 TB per node.
  • To scale linearly and handle a large number of files and data.
  • Recommended with Hadoop and S3 workloads.
  • Recommended with native API, fast IO scans, streaming reads, and writes.
  • Locality based on network topology (storage separate or together with compute).
  • Object-level rename in a bucket.

Advantages

OFS offers the following operational advantages:
  • Ability to share physical storage and nodes with HDFS.
  • Designed for easy Node-addition, deletion, and decommission for repair.
  • Has a security model similar to HDFS.
  • Supports Kerberos authentication.
  • Supports Data encryption at rest and in flight.
  • Supports Ranger Authorization.

Unsupported features

Even though Ozone supports Erasure Coding starting with CDP 7.1.8, Hive and Impala do not support Ozone Erasure Coding in CDW Private Cloud 1.5.0 release.