HDFS namespace mapping to Ozone

Mapping HDFS directories to Ozone volumes, buckets, and keys is extremely important to understand as you migrate the content. Apart from how to perform the mapping, the number of Ozone elements to create for the migration of data is also an important consideration.

Ozone volumes and HDFS mapping🔗

You can follow either of two approaches to map Ozone volumes with HDFS directories. You can create a volume called ‘/user’ in Ozone and create buckets that map to the second-level HDFS directories. This approach ensures that all HDFS existing paths in Hive or Spark applications can migrate to using Ozone by merely adding the ofs://<service-id> prefix.

Alternatively, in deployments where Ozone acts as an archival storage for multiple clusters, you can create a volume for every source HDFS cluster and then create buckets within that for individual directories or applications that use the HDFS cluster.

Number of Ozone volumes, buckets, and keys for the migration🔗

Although there is no strict limitation on the number of volumes, buckets and keys that can be managed by Ozone, Cloudera recommends the following:

Volumes in the range of 1-20
Buckets per volume in the range of 1-1000
Keys under /volume/bucket in the range of one to a few hundred million

Using OFS to abstract volumes and buckets🔗

The Ozone Filesystem (OFS) connector abstracts out the volume and the bucket from the FS client and works internally with the Ozone back-end to create them. For example, the following command creates the volume, bucket, and then automatically creates the two directories, if you have the required permissions:

hadoop fs -mkdir -p ofs://ozone1/spark-volume/spark-bucket/application1/instance1