Migrating Kafka from ZooKeeper to KRaft

Learn about migrating an existing Zookeeper-based Kafka cluster to KRaft. Migration is performed in Cloudera Manager using Kafka service actions, and requires minimal user interaction. Additionally, migration is done in a rolling fashion requiring no downtime.

Apache Kafka Raft (KRaft) is a consensus protocol used for metadata management that was developed as a replacement for Apache ZooKeeper. Using KRaft for managing Kafka metadata instead of ZooKeeper offers various benefits including a simplified architecture and a reduced operational footprint.

Migration at glance

Migration is completed in two steps, start and finalize, with the option to revert at any time before finalization. You manage a migration (start, finalize, or revert) using Kafka service actions available in Cloudera Manager.

Most configuration changes needed for migration are handled by Cloudera Manager when you run migration actions. Before starting migration, you are only required to deploy KRaft Controller service roles and configure required policies in Ranger. All other changes are automated by the migration actions.

The migration actions in Cloudera Manager are as follows:
  • Migrate Kafka to KRaft – Starts migration of Kafka brokers from ZooKeeper to KRaft for the specified Kafka service. Migrates the cluster up to a sate where reverting is still possible.

    When this action finishes, brokers will run in KRaft mode and will be disconnected from ZooKeeper. KRaft controllers will still be connected to ZooKeeper and will continue to write metadata to ZooKeeper, but are ready to disconnect (dual-write mode).

  • Finalize KRaft Migration – Finalizes migration by disconnecting KRaft controllers from ZooKeeper. Reverting to ZooKeeper is not possible once this action starts.

  • Revert KRaft Migration – Reverts migration of Kafka brokers from ZooKeeper to KRaft for the specified Kafka service using rolling restarts.

  • Revert KRaft Migration (force restart) – Reverts migration of Kafka brokers from ZooKeeper to KRaft for the specified Kafka service with normal restarts (non-rolling). Expect service downtime when using this action.

You can only start these actions if your cluster meets the required prerequisites. If prerequisites are not met, the actions will be disabled.

Cluster health during migration

During migration and until migration is finalized with the Finalize KRaft Migration action, the health of the Kafka service will be in concerning (yellow) health. This is because the Migrate Kafka to KRaft action stops, starts, and updates the Kafka service roles during the migration process.

The Kafka service will remain in concerning health even after the Migrate Kafka to KRaft action completes successfully. This is because the KRaft Controller roles stay in concerning health until migration is finalized. The KRaft Controllers display one of two health test states that indicate the migration progress:

  • KRaft migration state is: Pre-Migration – Controllers are in migration mode and waiting for brokers to join the migration. This is a transitional state during migration.

  • KRaft migration state is: Migration – Migration is in progress and controllers are running in dual-write mode (writing to both ZooKeeper and KRaft ). The Kafka Controllers will remain in this state until you run the Finalize KRaft Migration action.

Migrating a Kafka cluster to KRaft

You migrate an existing ZooKeeper-based Kafka cluster to KRaft by creating KRaft controllers, configuring required Ranger policies, and using the Migrate Kafka to KRaft Kafka service action in Cloudera Manager.

  • Migrating an existing ZooKeeper-based Kafka cluster KRaft is only available if you are on Cloudera Manager 7.13.2 or later and Cloudera Runtime 7.3.2.

    Migration is only possible with this combination of versions. This is because:
    • Earlier Cloudera Manager versions do not include the necessary Kafka service actions.

    • Cloudera Runtime 7.3.2 is the only version where migration is possible. Neither previous or future major, minor, and maintenance versions support migration.

  • Ensure that all Kafka Broker and ZooKeeper Server roles are running. In addition, ensure that the Kafka and ZooKeeper services do not have stale configurations.

    If you have stale configurations, resolve staleness by restarting the service or reverting configuration changes.

  • Ensure that both the Kafka and ZooKeeper services are in a healthy (green) state.

    Migration is blocked if the Kafka service is in concerning (yellow) or bad (red) health.

  • The Kafka service must run with inter-broker protocol version 3.9.

    Verify the version by checking the value of the Kafka Inter-Broker Protocol Version property in Cloudera Manager > Kafka service > Configuration. The value of the property must be 3.9 or empty. An empty value means that version is set to the default, which is 3.9 in Cloudera Runtime 7.3.2.

  1. Add the kraft user to required Ranger policies and restrict access to the __cluster_metadata topic.
    1. In the Ranger Admin Web UI, select the Kafka resource-based service (default cm_kafka).
    2. Add the kraft user to all policies that include the kafka user.
      The kraft user must have the same permission in all policies as the kafka user.
      At minimum, you must add the kraft user to the following default policies:
      • all - consumergroup

      • all - topic

      • all - transactionalid

      • all - cluster

      • all - delegationtoken

      • connect internal - topic

    3. Create a new policy that restricts access to the __cluster_metadata topic with the following permissions:
      • kraft user – All permissions

      • kafka user – Describe (describe), Describe Configs(describe_configs), and Consume (consume).

      Policy example in JSON:
      {
        "isEnabled": true,
        "service": "cm_kafka",
        "name": "kraft internal - topic",
        "policyType": 0,
        "policyPriority": 0,
        "description": "Policy for kraft internal - topic",
        "isAuditEnabled": true,
        "resources": {
          "topic": {
            "values": [
              "__cluster_metadata"
            ],
            "isExcludes": false,
            "isRecursive": false
          }
        },
        "policyItems": [
          {
            "accesses": [
              {
                "type": "create",
                "isAllowed": true
              },
              {
                "type": "delete",
                "isAllowed": true
              },
              {
                "type": "configure",
                "isAllowed": true
              },
              {
                "type": "alter",
                "isAllowed": true
              },
              {
                "type": "alter_configs",
                "isAllowed": true
              },
              {
                "type": "describe",
                "isAllowed": true
              },
              {
                "type": "describe_configs",
                "isAllowed": true
              },
              {
                "type": "consume",
                "isAllowed": true
              },
              {
                "type": "publish",
                "isAllowed": true
              }
            ],
            "users": [
              "kraft"
            ],
            "delegateAdmin": false
          },
          {
            "accesses": [
              {
                "type": "describe",
                "isAllowed": true
              },
              {
                "type": "describe_configs",
                "isAllowed": true
              },
              {
                "type": "consume",
                "isAllowed": true
              }
            ],
            "users": [
              "kafka"
            ],
            "delegateAdmin": false
          }
        ],
        "serviceType": "kafka",
        "isDenyAllElse": true
      }
      
  2. Add KRaft Controller service role instances to your cluster.
    1. In Cloudera Manager, select the Kafka service.
    2. Go to Configuration.
    3. Click Actions > Add Role Instances.
    4. Click Select hosts under KRaft Controller.
    5. Select at least three cluster hosts and click OK.
    6. Click Continue.
    7. On the Review Changes page, configure the service role based on your cluster and requirements.
    8. Click Finish.
      Adding new roles causes configuration staleness in the Kafka service.
  3. Restart the Kafka Broker and Kafka Connect service roles.
  4. Click Actions > Migrate Kafka to KRaft.
    This action migrates your Kafka cluster to KRaft.
  5. Wait until the action finishes.
ZooKeeper to KRaft migration for Kafka succeeded. However, migration is not finalized. Brokers are running in KRaft mode and are disconnected from ZooKeeper. KRaft controllers are still connected to ZooKeeper and continue to write metadata to ZooKeeper (dual-write mode).

The Kafka service will be in a concerning (yellow) health state. This is because the KRaft Controller roles will stay in a concerning health with the KRaft migration state is: Migration health test state until the migration is finalized. This is normal and expected behavior. This health test state indicates that migration is not yet finalized.

Finalize or revert the migration.

Finalizing a KRaft migration

You finalize ZooKeeper to KRaft migration with the Finalize KRaft Migration Kafka service action in Cloudera Manager.

  • Ensure that the Migrate Kafka to KRaft action has successfully finished.

    If your KRaft Controller roles are in a concerning health state with the KRaft migration state is: Migration health test state, you can safely proceed with finalization. This health test state indicates that the service is ready for finalization.

  • Ensure that all applications or services that connect to Kafka operate normally and can produce or consume messages.
  1. In Cloudera Manager select the Kafka service.
  2. Click Actions > Finalize KRaft Migration.
  3. Wait until the action finishes.
KRaft migration is finalized. It is not possible to revert this operation. KRaft roles are restarted and disconnected from ZooKeeper. The Kafka service is now running in KRaft mode.

Reverting a Kafka cluster to ZooKeeper

You revert migration with the Revert KRaft Migration or Revert KRaft Migration (force restart) Kafka service actions in Cloudera Manager.

Reverting to ZooKeeper is only possible if the migration to KRaft is not finalized. A migration is not finalized if the KRaft Controller roles are in a concerning health state with the KRaft migration state is: Migration health test state.

  1. In Cloudera Manager, select the Kafka service.
  2. Click the Revert KRaft Migration or Revert KRaft Migration (force restart) action.
    Both of the actions revert a cluster to using ZooKeeper and undo the changes made by the Migrate Kafka to KRaft action.
    The difference between the two actions is the type of restart used. The Revert KRaft Migration action uses rolling restarts. The Revert KRaft Migration (force restart) action uses normal (non-rolling) restarts.

    Cloudera recommends that you use Revert KRaft Migration. If the action fails or rolling restarts are not required, use Revert KRaft Migration (force restart).

  3. Wait until the command finishes.
Migration is successfully rolled back. The Kafka cluster uses ZooKeeper for metadata management