Streams Messaging cluster layout

The Data Hub service includes three default Streams Messaging cluster definitions. These are the Streams Messaging: Light Duty, Streams Messaging: Heavy Duty cluster, and Streams Messaging: High Availability definitions. Learn about the layout, capacity, and components of these definitions.

Streams Messaging: Light Duty cluster layout

Learn about the layout, capacity, and components of the Streams Messaging: Light Duty cluster definition.

You can use a Streams Messaging: Light Duty cluster definition in development, testing, or proof of concept scenarios. Light Duty clusters include the following nodes and components (services):

Non-default nodes and scaling

Broker and KRaft nodes are not provisioned by default. You have the option to manually set how many of these nodes are created when provisioning the cluster. After the cluster is provisioned, the number of Broker and KRaft nodes can be changed by scaling your cluster. For more information about scaling KRaft and Broker Nodes, see Scaling Streams Messaging Clusters.

KRaft and Zookeeper

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

Broker and Core Broker volume per instance count

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

Default instance and storage configuration

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: Light Duty cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 1. Streams Messaging: Light Duty default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E8s_v3 100 GB Standard Locally-redundant SSD storage
Core Broker Standard_D8s_v3 1 TB Locally-redundant storage
Broker Standard_D8_v3 1 TB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage
Table 2. Streams Messaging: Light Duty default hardware configuration in AWS
Node Instance type Storage configuration
Master r5.2xlarge 100 GB Magnetic
Core Broker m5.2xlarge 1 TB Throughput Optimized HDD
Broker m5.2xlarge 1 TB Throughput Optimized HDD
KRaft m5.2xlarge 100 GB Magnetic
Table 3. Streams Messaging: Light Duty default hardware configuration in GCP
Node Instance type Storage configuration
Master e2-highmem-8 100 GB Standard Persistent Disk (HDD)
Core Broker e2-standard-8 1 TB Standard persistent disks (HDD)
Broker e2-standard-8 1 TB Standard persistent disks (HDD)
KRaft e2-standard-8 100 GB Standard Persistent Disk (HDD)

Streams Messaging: Heavy Duty cluster layout

Learn about the layout, capacity, and components of the Streams Messaging: Heavy Duty cluster definition.

You can use the Streams Messaging: Heavy Duty cluster definition in production scenarios. Heavy Duty clusters include the following nodes and components (services):

Non-default nodes and scaling

SRM, Broker, Connect, and KRaft nodes are not provisioned by default. If you want to have any of these services provisioned, you must manually set the instance count of the appropriate host group to at least one during cluster provisioning. Otherwise, the host group and its nodes are not provisioned. After a cluster is provisioned, you also have the option to scale these nodes. For more information on scaling, see Scaling Streams Messaging Clusters.

KRaft and Zookeeper

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

Broker and Core Broker volume per instance count

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

Default instance and storage configuration

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: Heavy Duty cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 4. Streams Messaging: Heavy Duty default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E8s_v3 100 GB Standard Locally-redundant SSD storage
Core Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Registry Standard_D8_v3 100 GB Locally-redundant storage
SMM Standard_D8_v3 100 GB Locally-redundant storage
SRM Standard_D8_v3 100 GB Locally-redundant storage
Connect Standard_D8_v3 100 GB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage
Table 5. Streams Messaging: Heavy Duty default hardware configuration in AWS
Node Instance type Storage configuration
Master r5.2xlarge 100 GB Magnetic
Core Broker m5.2xlarge 1 TB General Purpose (GP3 SSD)
Broker m5.2xlarge 1 TB General Purpose (GP3 SSD)
Registry m5.2xlarge 100 GB Magnetic
SMM m5.2xlarge 100 GB Magnetic
SRM m5.2xlarge 100 GB Magnetic
Connect m5.2xlarge 100 GB Magnetic
KRaft m5.2xlarge 100 GB Magnetic
Table 6. Streams Messaging: Heavy Duty default hardware configuration in GCP
Node Instance type Storage configuration
Master e2-highmem-8 100 GB Standard persistent disks (HDD)
Core Broker e2-standard-8 1 TB Solid-state persistent disks (SSD)
Broker e2-standard-8 1 TB Solid-state persistent disks (SSD)
Registry e2-standard-8 100 GB Standard persistent disks (HDD)
SMM e2-standard-8 100 GB Standard persistent disks (HDD)
SRM e2-standard-8 100 GB Standard persistent disks (HDD)
Connect e2-standard-8 100 GB Standard persistent disks (HDD)
KRaft e2-standard-8 100 GB Standard persistent disks (HDD)

Streams Messaging: High Availability cluster layout

Learn about the layout, capacity, and components of the Streams Messaging: High Availability cluster definition.

You can use the Streams Messaging: High Availability cluster definition in production scenarios where having a highly available cluster spanning multiple Availability Zones (multi-AZ) is required. High Availability clusters include the following nodes and components (services):

Non-default nodes and scaling

SRM, Broker, Connect, and KRaft nodes are not provisioned by default. If you want to have any of these services provisioned, you must manually set the instance count of the appropriate host group to at least one during cluster provisioning. Otherwise, the host group and its nodes are not provisioned. After a cluster is provisioned, you also have the option to scale these nodes. For more information on scaling, see Scaling Streams Messaging Clusters.

Use multiple subnets for high availability

When using the Streams Messaging High Availability definition, ensure that you select multiple subnets when provisioning the cluster. Otherwise, your cluster will not be highly available.

KRaft and Zookeeper

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

Broker and Core Broker volume per instance count

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

Default instance and storage configuration

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: High Availability cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 7. Streams Messaging: High Availability default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E16s_v3 100 GB Standard Locally-redundant SSD storage
Manager 1 Standard_D16_v3 100 GB Locally-redundant storage
Core Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Core ZooKeeper Standard_D8s_v3 100 GB Locally-redundant storage
SRM Standard_D8_v3 100 GB Locally-redundant storage
Connect Standard_D8_v3 100 GB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage
Table 8. Streams Messaging: High Availability default hardware configuration in AWS
Node Instance type Storage configuration
Master r5.4xlarge 100 GB Magnetic
Manager 1 m5.4xlarge 100 GB Magnetic
Core Broker m5.2xlarge 1 TB General Purpose (GP3 SSD)
Broker m5.2xlarge 1 TB General Purpose (GP3 SSD)
Core ZooKeeper m5.2xlarge 100 GB Magnetic
SRM m5.2xlarge 100 GB Magnetic
Connect m5.2xlarge 100 GB Magnetic
KRaft m5.2xlarge 100 GB Magnetic
Table 9. Streams Messaging: High Availability default hardware configuration in GCP
Node Instance type Storage configuration
Master e2-highmem-16 100 GB Standard persistent disks (HDD)
Manager 1 e2-standard-16 100 GB Standard persistent disks (HDD)
Core Broker e2-standard-8 1 TB Solid-state persistent disks (SSD)
Broker e2-standard-8 1 TB Solid-state persistent disks (SSD)
Core ZooKeeper e2-standard-8 100 GB Standard persistent disks (HDD)
SRM e2-standard-8 100 GB Standard persistent disks (HDD)
Connect e2-standard-8 100 GB Standard persistent disks (HDD)
KRaft e2-standard-8 100 GB Standard persistent disks (HDD)
1 The Manager node is only available on clusters provisioned after October 30, 2023.