Planning your Streams Messaging deploymentPDF version

Streams Messaging cluster layout

The Data Hub service includes three default Streams Messaging cluster definitions. These are the Streams Messaging: Light Duty, Streams Messaging: Heavy Duty cluster, and Streams Messaging: High Availability definitions. Learn about the layout, capacity, and components of these definitions.

Learn about the layout, capacity, and components of the Streams Messaging: Light Duty cluster definition.

You can use a Streams Messaging: Light Duty cluster definition in development, testing, or proof of concept scenarios. Light Duty clusters include the following nodes and components (services):

Broker and KRaft nodes are not provisioned by default. You have the option to manually set how many of these nodes are created when provisioning the cluster. After the cluster is provisioned, the number of Broker and KRaft nodes can be changed by scaling your cluster. For more information about scaling KRaft and Broker Nodes, see Scaling Streams Messaging Clusters.

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: Light Duty cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 1. Streams Messaging: Light Duty default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E8s_v3 100 GB Standard Locally-redundant SSD storage
Core Broker Standard_D8s_v3 1 TB Locally-redundant storage
Broker Standard_D8_v3 1 TB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage

Learn about the layout, capacity, and components of the Streams Messaging: Heavy Duty cluster definition.

You can use the Streams Messaging: Heavy Duty cluster definition in production scenarios. Heavy Duty clusters include the following nodes and components (services):

SRM, Broker, Connect, and KRaft nodes are not provisioned by default. If you want to have any of these services provisioned, you must manually set the instance count of the appropriate host group to at least one during cluster provisioning. Otherwise, the host group and its nodes are not provisioned. After a cluster is provisioned, you also have the option to scale these nodes. For more information on scaling, see Scaling Streams Messaging Clusters.

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: Heavy Duty cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 4. Streams Messaging: Heavy Duty default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E8s_v3 100 GB Standard Locally-redundant SSD storage
Core Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Registry Standard_D8_v3 100 GB Locally-redundant storage
SMM Standard_D8_v3 100 GB Locally-redundant storage
SRM Standard_D8_v3 100 GB Locally-redundant storage
Connect Standard_D8_v3 100 GB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage

Learn about the layout, capacity, and components of the Streams Messaging: High Availability cluster definition.

You can use the Streams Messaging: High Availability cluster definition in production scenarios where having a highly available cluster spanning multiple Availability Zones (multi-AZ) is required. High Availability clusters include the following nodes and components (services):

SRM, Broker, Connect, and KRaft nodes are not provisioned by default. If you want to have any of these services provisioned, you must manually set the instance count of the appropriate host group to at least one during cluster provisioning. Otherwise, the host group and its nodes are not provisioned. After a cluster is provisioned, you also have the option to scale these nodes. For more information on scaling, see Scaling Streams Messaging Clusters.

When using the Streams Messaging High Availability definition, ensure that you select multiple subnets when provisioning the cluster. Otherwise, your cluster will not be highly available.

By default, Kafka uses Zookeeper as its metadata store. If you provision KRaft nodes in the cluster, Kafka runs in KRaft mode and uses KRaft as its metadata store. In this case, ZooKeeper instances are still provisioned. However, ZooKeeper is not used by Kafka to store and manage metadata. If required, ZooKeeper can be removed from the cluster after the provisioning is finished. ZooKeeper can be removed in Cloudera Manager. For more information, see Deleting ZooKeeper from Streams Messaging clusters

You can only provision an odd number of KRaft nodes. This is required so that KRaft can hold a majority election for leadership. While it is possible to run Kafka in KRaft mode with a single KRaft node, Cloudera recommends that you provision a minimum of three to avoid having a single point of failure. Cluster provisioning fails if an even number of KRaft nodes are provisioned.

By default, the volume per instance count for Broker and Core Broker nodes is identical. If you customize your cluster during provisioning, Cloudera recommends that Attached Volume per Instances is set to the same value for both node types. Alternatively, if you want to provision a cluster where the number of volumes is not identical, ensure that you complete Configure data directories for clusters with custom disk configurations after the cluster is provisioned. Otherwise, Kafka does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

The following table collects the default instance type and storage configuration of the various nodes deployed with the Streams Messaging: High Availability cluster. For more information about the cloud provider-specific instance and storage types, see the Related Information section.

Table 7. Streams Messaging: High Availability default hardware configuration in Azure
Node Instance type Storage configuration
Master Standard_E16s_v3 100 GB Standard Locally-redundant SSD storage
Manager 1 Standard_D16_v3 100 GB Locally-redundant storage
Core Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Broker Standard_D8s_v3 1 TB Premium locally-redundant storage
Core ZooKeeper Standard_D8s_v3 100 GB Locally-redundant storage
SRM Standard_D8_v3 100 GB Locally-redundant storage
Connect Standard_D8_v3 100 GB Locally-redundant storage
KRaft Standard_D8_v3 100 GB Locally-redundant storage
1 The Manager node is only available on clusters provisioned after October 30, 2023.