Cloudera Streaming Analytics cluster layout

Cloudera Streaming Analytics Light Duty and Heavy Duty cluster definitions differ in typical workload, topology, and cost. Heavy Duty adds dedicated worker nodes with SSD-backed storage for large Flink state; Light Duty does not include those workers. Choose the definition that matches your operational goals and application requirements.

Cloudera Streaming Analytics: Light Duty and Heavy Duty compared

Use the following summary to compare the two cluster definitions at a glance.

Table 1. Cloudera Streaming Analytics Light Duty and Heavy Duty
Consideration Light Duty Heavy Duty
Typical use Development and testing; production for stateless Flink jobs or jobs with minimal state Production for Flink jobs with large state using RocksDB as the state backend
Topology Flink, Cloudera SQL Stream Builder, HDFS, YARN, Zookeeper, and Kafka are co-located on all instances. Does not include the separate worker node group described for Heavy Duty. Same co-located services as Light Duty, plus additional worker nodes with SSD-backed volumes for Flink state and checkpoints (see specifications below).
Cost drivers Lower baseline footprint: no dedicated worker nodes with large SSD volumes. Higher cost: worker nodes add instances and 1000 GB SSD storage per worker specification below.

Cluster layout diagram

The following figure illustrates the Cloudera Streaming Analytics cluster templates. The co-located services apply to both Light Duty and Heavy Duty. Only Heavy Duty includes the separate worker nodes with SSD storage; that worker tier is not part of the Light Duty layout.

Cloudera Streaming Analytics: Light Duty cluster layout

You can use a Cloudera Streaming Analytics: Light Duty cluster definition in development and testing scenarios. The Light Duty cluster definition can also be used in production for stateless Flink jobs or for Flink jobs with minimal state. The Light Duty template does not include the Heavy Duty worker nodes with SSD storage; only the co-located node group described in the following specifications applies. The Light Duty cluster has the following specifications:
  • Flink, Cloudera SQL Stream Builder, HDFS, YARN, Zookeeper and Kafka are co-located on all instances
  • For each node hosting Flink, Cloudera SQL Stream Builder, HDFS, YARN, Zookeeper and Kafka
    • AWS: m5.2xlarge
    • Azure: Standard_D8_v3
    • GCP: e2-standard-8

For more information, see your cloud provider-specific information about instance types and storage information.

Cloudera Streaming Analytics: Heavy Duty cluster layout

You can use a Cloudera Streaming Analytics: Heavy Duty cluster definition in production for Flink jobs with large state with RocksDB as state backend. The Heavy Duty cluster has the following specifications:
  • Flink, Cloudera SQL Stream Builder, HDFS, YARN, Zookeeper and Kafka are co-located on all instances
  • For each node hosting Flink, Cloudera SQL Stream Builder, HDFS, YARN, Zookeeper and Kafka
    • AWS: m5.2xlarge
    • Azure: Standard_D8_v3
    • GCP: e2-standard-8
  • For worker nodes:
    • Storage type: SSD
    • Volume size: 1000 GB

For more information, see your cloud provider-specific information about instance types and storage information.