Configuring data directories for clusters with custom disk configurations
Data Hub clusters provisioned with the default Streams Messaging cluster definitions
that have non-identical volume numbers configured for the Broker and Core_brokers host groups
require additional configuration after cluster provisioning. If configuration is not done, the
Kafka service deployed on these clusters does not utilize all available volumes. Additionally,
scaling the cluster might not be possible.
Clusters provisioned with version 7.2.12 or later of the Streams Messaging cluster
definitions have two host groups that contain Kafka brokers (broker type groups). These are
as follows:
Core_brokers: This host group is provisioned by default, it
contains a core set of brokers, and is not scalable.
Broker: This host group is not provisioned by default (instance
count is 0), but is scalable.
By default both broker type groups have the same disk configuration. The disk configuration
can be changed during cluster provisioning in Advanced
Configuration > Hardware and Storage. If
you customize the Attached Volume per Instance property and the
volume numbers set for the Broker and Core_broker
groups are not identical, manually configuring the data directories
(log.dirs) that Kafka uses is required after the cluster is
provisioned.
If configuration is not done, Kafka does not utilize all volumes that are available. For
example, assume you set Attached Volume per Instances to 2 for the
Broker host group and 3 for the Core_brokers
host group. In a case like this, Kafka by default utilizes two out of the three volumes.
That is, it will utilize the lower number of volumes out of the two host group
configurations. Additionally, scaling the cluster might also not be possible.
Access the Cloudera Manager instance managing the cluster.
Select the Kafka service.
Configure the Data Directories property for the default
Kafka Broker role group.
Go to Configuration.
Find and configure the Data Directories
property.
The Data Directories property for the
default role group must reflect the disk configuration (number of volumes) of the
Broker host group. The property is configured by adding a
unique data directory for each of the volumes. Each unique directory must be added as
a separate entry. New entries can be added by clicking the +
(add) icon next to the property. If configured correctly, the number of entries in the
property should be identical with the number of volumes configured for the
Broker host group.
The data directories you specify must be
/hadoopfs/fs[***NUMBER***]/kafka. For
example, in case of two volumes, the data directories you need to specify are as
follows:
/hadoopfs/fs1/kafka
/hadoopfs/fs2/kafka
Click Save Changes.
Go to Instances.
Configure role level overrides for the Data Directories
property.
The following substeps must be repeated for each Kafka Broker role
instance that is part of the Core_brokers host group. Ensure that the
override you configure is identical for all Kafka Broker role instances.
Select a Broker role instance that is deployed in
the Core_brokers host group.
Go to Configuration.
Find and configure the Data Directories
property.
In this step you are configuring an override for the property.
The value you set here is only be used by the role instance you have selected in
substep 5.a.
Configure the
override so that it reflects the disk configuration (number of volumes) of the
Core_brokers host group. The property override is configured
by adding a unique data directory for each of the volumes. Each unique directory
must be added as a separate entry. New entries can be added by clicking the
+ (add) icon next to the property. If configured correctly,
the number of entries in the property should be identical with the number of volumes
configured for the Core_brokers host group.
The data
directories you specify must be
/hadoopfs/fs[***NUMBER***]/kafka. For
example, in case of three volumes, the data directories you need to specify are as
follows:
After both the default role group configuration and the overrides are specified, navigate to
the service level configuration page of Kafka and find the Data
Directories property. The property is displayed with the full configuration
including the overrides. For example, if Core_brokers had three
volumes and Broker had two volumes, the correct configuration would
be similar to the following example:
Kafka data directories are configured. The Kafka service now uses all volumes
available for the cluster.