Storage types
Learn about the supported storage types of the Cloudera Operational Database.
Cloudera Operational Database supports the following storage types when creating a
database. These storage types can be specified as part of the --storage-type
parameter.
Syntax: –storage-type [CLOUD_WITH_EPHEMERAL][CLOUD][HDFS]
You can deploy the Cloudera Operational Database using three scale types namely
HEAVY DUTY
, LIGHT DUTY
, and MICRO
DUTY
.
Performance characteristics
The Cloudera Operational Database uses different instance types on the worker nodes based on the selected scale and storage types. The following are the high-level descriptions of each storage type used by the Cloudera Operational Database.
- Cloud Storage with Caching
--storage-type CLOUD_WITH_EPHEMERAL
- This storage configuration utilizes ephemeral storage alongside the block storage provided by the underlying cloud provider. In this setup, the Cloudera Operational Database prioritizes storing as much data as possible in ephemeral storage to ensure faster access. If certain data is unavailable in ephemeral storage, it is retrieved from the slower block storage. The database achieves optimal performance when all data resides in ephemeral storage. For more information, see Ephemeral Storage.
- In this configuration, each worker node is equipped with 1.6TB of ephemeral storage. To achieve optimal performance, the cluster size must be planned such that all data fits within the available ephemeral storage.
- In this configuration, an initial cost is incurred to warm up the ephemeral cache, during which all data is cached into the ephemeral storage at cluster startup. This cache warming process ensures that the cluster operates at peak efficiency once the cache is fully populated.
- Cache warming also mitigates the impact of AWS S3 throttling, which limits the number of calls allowed per second to the cloud storage.
- Once the cache is fully warmed up, this configuration delivers twice the performance of HDFS while achieving a lower total cost of ownership (TCO).
- This configuration is particularly beneficial for use cases that require strong read performance, albeit at a slightly higher cost. A common scenario is using this configuration for a production cluster that handles both read and write workloads while still achieving a 40% cost reduction compared to an HDFS-based cluster.
- Cloud Storage
--storage-type CLOUD
- This storage type relies solely on the block storage provided by the underlying cloud provider. It is more suitable for scenarios where read performance is not a priority.
- In this configuration, the absence of ephemeral storage means that all read requests are served directly from the slower block storage. As a result, this configuration typically performs slower than other storage types.
- This configuration is particularly useful for use cases that involve heavy write workloads with minimal read workloads.
- By eliminating ephemeral storage, you can rely on cloud storage and achieve a 25% cost savings compared to an HDFS-based cluster.
- This configuration is ideal for use as a disaster recovery (DR) cluster, handling writes from less critical applications.
- HDFS
--storage-type HDFS
- The Cloudera Operational Database leverages the Hadoop Distributed File System (HDFS) to store large volumes of data. HDFS offers scalable and reliable storage by utilizing clusters of commodity servers. For more information, see HDFS Overview.
- HDFS relies on costlier EBS-HDD storage and requires three times the actual storage space to accommodate data replication, making it more expensive from a total cost of ownership (TCO) perspective.
For more information, see this article, Cloudera Operational Database Performance Benchmarking: Comparing HDFS and Cloud Storage.
Instance types
You can retrieve the list of supported Cloudera Operational Database instance types by using the list-supported-instance-types CLI command. This command provides details about the instance types supported for specific combinations of cloud platform, scale type, and storage type. Additionally, you can apply filters based on instance groups and architecture to narrow down the results.
The following is an example of the command.
cdp opdb list-supported-instance-types --cloud-platform AZURE --storage-type CLOUD_WITH_EPHEMERAL --scale-type MICRO --instance-group WORKER --architecture X86_64
- Enhancements to the create-database command
- A new CLI command to get the list of supported instance types