CDH 6 includes Apache Kafka as part of the core package. The documentation includes improved contents for how to set up, install, and administer your Kafka ecosystem. For more information, see the Cloudera Enterprise 6.0.x Apache Kafka Guide. We look forward to your feedback on both the existing and new documentation.

JBOD Setup and Migration

Consider the following before using JBOD support in Kafka:
  • Manual operation and administration: Monitoring offline directories and JBOD related metrics is done through Cloudera Manager. However, identifying failed disks and rebalancing partitions between disks is done manually.
  • Manual load balancing between disks: Unlike with RAID-10, JBOD does not automatically distribute data across disks. The process is fully manual.

To provide robust JBOD support in Kafka, changes in the Kafka protocol have been made. When performing an upgrade to a new version of Kafka, make sure that you follow the recommended rolling upgrade process.

For more information, see Upgrading the CDH Cluster.

For more information regarding the JBOD related Kafka protocol changes, see KIP-112 and KIP-113.

Setup

To set up JBOD in your Kafka environment, perform the following steps:

  1. Mount the required number of disks on your system.
  2. In Cloudera Manager, set up log directories for all Kafka brokers.
    1. Go to the Kafka service, select Instances and select the broker.
    2. Go to Configuration and find the Data Directories property.
    3. Modify the path of the log directories so that they correspond with the newly mounted disks.
    4. Enter a Reason for change, and then click Save Changes to commit the changes.
  3. Go to the Kafka service and select Configuration.
  4. Find and configure the following properties depending on your system and use case.
    • Number of I/O Threads
    • Number of Replica Fetchers
    • Minimum Number of Replicas in ISR

    Additionally you also have to configure the number of network threads, num.network.threads. However, in Cloudera Manager 5.x.x, this property can only be configured via the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties safety valve. For more information regarding configuration using safety valves, see Custom Configuration

  5. Set replication factor to at least 3.
  6. Restart the service.
    1. Return to the Home page by clicking the Cloudera Manager logo.
    2. Go to the Kafka service and select Actions > Rolling Restart.
    3. Check the Restart roles with stale configurations only checkbox and click Rolling restart.
    4. Click Close when the restart has finished.

Migration

Migrating data from one disk to another is achieved with the kafka-reassign-partitions tool. The following instructions focus on migrating existing Kafka partitions to JBOD configured disks. For a full tool description, see kafka-reassign-partitions.

Prerequisites

  • Set up JBOD in your Kafka environment. For more information, see Setup.
  • Collect the log directory paths on the JBOD disks where you want to migrate existing data.
  • Collect the broker IDs of the brokers you want to migrate data to.
  • Collect the name of the topics you want to migrate partitions from.

Steps

To migrate data to JBOD configured disks, perform the following steps:

  1. Create a topics-to-move JSON file that specifies the topics you want to reassign. Use the following format:
    {"topics":  [{"topic": "mytopic1"},
                 {"topic": "mytopic2"}],
     "version":1
    }
  2. Generate the content for the reassignment configuration JSON with the following command:
    kafka-reassign-partitions --zookeeper hostname:port --topics-to-move-json-file topics to move.json --broker-list broker 1, broker 2 --generate

    Running the command lists the distribution of partition replicas on your current brokers followed by a proposed partition reassignment configuration.

    Example output:

    Current partition replica assignment
    {"version":1,
     "partitions":
       [{"topic":"mytopic2","partition":1,"replicas":[2,3],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":2,"replicas":[3,1],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":1,"replicas":[2,3],"log_dirs":["any","any"]}]
    }
    
    Proposed partition reassignment configuration
    
    {"version":1,
     "partitions":
       [{"topic":"mytopic1","partition":0,"replicas":[4,5],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":2,"replicas":[4,5],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":1,"replicas":[4,5],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":1,"replicas":[5,4],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":0,"replicas":[5,4],"log_dirs":["any","any"]}]
    }

    In this example, the tool proposed a configuration which reassigns existing partitions on broker 1, 2, and 3 to brokers 4 and 5.

  3. Copy and paste the proposed partition reassignment configuration into an empty JSON file.
  4. Modify the suggested reassignment configuration.

    When migrating data you have two choices. You can move partitions to a different log directory on the same broker, or move it to a different log directory on another broker.

    1. To reassign partitions between log directories on the same broker, change the appropriate any entry to an absolute path. For example:
      {"topic":"mytopic1","partition":0,"replicas":[4,5],"log_dirs":["/JBOD-disk/directory1","any"]}
    2. To reassign partitions between log directories across different brokers, change the broker ID specified in replicas and the appropriate any entry to an absolute path. For example:
      {"topic":"mytopic1","partition":0,"replicas":[6,5],"log_dirs":["/JBOD-disk/directory1","any"]}
  5. Save the file.
  6. Start the redistribution process with the following command:

    The tool prints a list containing the original replica assignment and a message that reassignment has started. Example output:

    Current partition replica assignment
    
    {"version":1,
     "partitions":
       [{"topic":"mytopic2","partition":1,"replicas":[2,3],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},
        {"topic":"mytopic2","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":2,"replicas":[3,1],"log_dirs":["any","any"]},
        {"topic":"mytopic1","partition":1,"replicas":[2,3],"log_dirs":["any","any"]}]
    }
    
    Save this to use as the --reassignment-json-file option during rollback
    Successfully started reassignment of partitions.
  7. Verify the status of the reassignment with the following command:
    kafka-reassign-partitions --zookeeper hostname:port --reassignment-json-file reassignment configuration.json  --bootstrap-server hostname:port --verify
    The tool prints the reassignment status of all partitions. Example output:
    Status of partition reassignment: 
    Reassignment of partition mytopic2-1 completed successfully
    Reassignment of partition mytopic1-0 completed successfully
    Reassignment of partition mytopic2-0 completed successfully
    Reassignment of partition mytopic1-2 completed successfully
    Reassignment of partition mytopic1-1 completed successfully