Configuring data directories for clusters with custom disk configurations

Data Hub clusters provisioned with the default Streams Messaging cluster definitions that have non-identical volume numbers configured for the Broker and Core_brokers host groups require additional configuration after cluster provisioning. If configuration is not done, the Kafka service deployed on these clusters does not utilize all available volumes. Additionally, scaling the cluster might not be possible.

Clusters provisioned with version 7.2.12 or later of the Streams Messaging cluster definitions have two host groups that contain Kafka brokers (broker type groups). These are as follows:
  • Core_brokers: This host group is provisioned by default, it contains a core set of brokers, and is not scalable.
  • Broker: This host group is not provisioned by default (instance count is 0), but is scalable.

By default both broker type groups have the same disk configuration. The disk configuration can be changed during cluster provisioning in Advanced Configuration > Hardware and Storage. If you customize the Attached Volume per Instance property and the volume numbers set for the Broker and Core_broker groups are not identical, manually configuring the data directories (log.dirs) that Kafka uses is required after the cluster is provisioned.

If configuration is not done, Kafka does not utilize all volumes that are available. For example, assume you set Attached Volume per Instances to 2 for the Broker host group and 3 for the Core_brokers host group. In a case like this, Kafka by default utilizes two out of the three volumes. That is, it will utilize the lower number of volumes out of the two host group configurations. Additionally, scaling the cluster might also not be possible.

  1. Access the Cloudera Manager instance managing the cluster.
  2. Select the Kafka service.
  3. Configure the Data Directories property for the default Kafka Broker role group.
    1. Go to Configuration.
    2. Find and configure the Data Directories property.
      The Data Directories property for the default role group must reflect the disk configuration (number of volumes) of the Broker host group. The property is configured by adding a unique data directory for each of the volumes. Each unique directory must be added as a separate entry. New entries can be added by clicking the + (add) icon next to the property. If configured correctly, the number of entries in the property should be identical with the number of volumes configured for the Broker host group.
      The data directories you specify must be /hadoopfs/fs[***NUMBER***]/kafka. For example, in case of two volumes, the data directories you need to specify are as follows:
      /hadoopfs/fs1/kafka
      /hadoopfs/fs2/kafka
      
    3. Click Save Changes.
  4. Go to Instances.
  5. Configure role level overrides for the Data Directories property.
    The following substeps must be repeated for each Kafka Broker role instance that is part of the Core_brokers host group. Ensure that the override you configure is identical for all Kafka Broker role instances.
    1. Select a Broker role instance that is deployed in the Core_brokers host group.
    2. Go to Configuration.
    3. Find and configure the Data Directories property.
      In this step you are configuring an override for the property. The value you set here is only be used by the role instance you have selected in substep 5.a.

      Configure the override so that it reflects the disk configuration (number of volumes) of the Core_brokers host group. The property override is configured by adding a unique data directory for each of the volumes. Each unique directory must be added as a separate entry. New entries can be added by clicking the + (add) icon next to the property. If configured correctly, the number of entries in the property should be identical with the number of volumes configured for the Core_brokers host group.

      The data directories you specify must be /hadoopfs/fs[***NUMBER***]/kafka. For example, in case of three volumes, the data directories you need to specify are as follows:
      /hadoopfs/fs1/kafka
      /hadoopfs/fs2/kafka
      /hadoopfs/fs3/kafka
      
    4. Click Save Changes.
  6. Review and verify your configuration.
    After both the default role group configuration and the overrides are specified, navigate to the service level configuration page of Kafka and find the Data Directories property. The property is displayed with the full configuration including the overrides. For example, if Core_brokers had three volumes and Broker had two volumes, the correct configuration would be similar to the following example:
Kafka data directories are configured. The Kafka service now uses all volumes available for the cluster.

Continue with Giving access to your cluster.