Migrating Streaming Workloads to Cloudera Private CloudPDF version

Migrate Kafka Using Streams Replication Manager

Learn about the different options you have when migrating Kafka from HDF to Cloudera Private Cloud Base using Streams Replication Manager.

Kafka data is migrated from HDF to Cloudera Private Cloud Base using Streams Replication Manager. Streams Replication Manager can replicate data in various ways. How the data is replicated, and in this case migrated, is determined by the replication policy that is in use.

There are three replication policies that you can use when migrating data. These are the DefaultReplicationPolicy, the IdenityReplicationPolicy, and the MigratingReplicationPolicy. The following gives an overview of each policy and provides recommendations on which policy to use in different scenarios. Review the following sections and choose the policy that is best suited for your requirements.

The DefaultReplicationPolicy is the default and Cloudera-recommended replication policy. This policy prefixes the remote (replicated) topics with the cluster name (alias) of the source topics. For example, the topic1 topic from the us-west source cluster creates the us-west.topic1 remote topic on the target cluster. Use this policy if topics getting renamed during the migration is acceptable for your deployment.

Additional notes:
  • Remote topics will have different names in the target cluster. As a result, you must reconfigure existing Kafka clients to use the remote topic names.
  • If you decide to, you can repurpose the Streams Replication Manager service you set up for migration and continue using it for replication.

The IdentityReplicationPolicy does not change the names of remote topics. When this policy is in use, topics retain the same name on both source and target clusters. For example, the topic1 topic from the us-west source cluster creates the topic1 remote topic on the target cluster. Use this policy if you are on Cloudera Runtime 7.1.8 or higher and do not want remote topics to get renamed during migration.

Additional notes:
  • In Cloudera Runtime 7.1.8 replication monitoring with this policy is not supported. This means that you will not be able to validate or monitor replications during the migration process. Support for replication monitoring is, however, available in Cloudera Runtime 7.1.9 or higher.
  • If you decide to, you can repurpose the Streams Replication Manager service you set up for migration and continue using it for replication.
  • If you want to continue using Streams Replication Manager after migration, review the limitations of this policy in the Streams Replication Manager Known Issues of the appropriate Cloudera Runtime version. Different limitations might apply depending on the Cloudera Runtime version.

The MigratingReplicationPolicy is a custom replication policy that Cloudera provides the code for, but is not shipped with Streams Replication Manager like the IdentityReplicationPolicy or the DefaultReplicationPolicy. As a result, you must implement, compile, and package it as a JAR yourself.

This policy behaves similarly to the IdentityReplicationPolicy and does not rename replicated topics on target clusters. However, unlike the IdentityReplicationPolicy, this policy is only supported in data migration scenarios. Use this policy if you are using Cloudera Runtime 7.1.7 or lower and you do not want replicated topics to get renamed.

Additional notes:
  • If you are using Cloudera Runtime 7.1.8 or later, Cloudera recommends that you use the IdentityReplicationPolicy instead.
  • Other than implementing, compiling, and packaging the policy, you also need to carry out advanced configuration steps to use the policy.
  • Replication monitoring with this policy is not supported. This means that you will not be able to validate or monitor replications during the migration process.
  • This replication policy is only supported with a unidirectional data replication setup where replication happens from a single source cluster to a single target cluster. Configuring additional hops or bidirectional replication is not supported and can lead to severe replication issues.
  • Using an Streams Replication Manager service configured with this policy for any other scenario than data migration is not supported. Once migration is complete, the Streams Replication Manager instance you set up must be reconfigured to use the IdenityReplicationPolicy or DefaultReplicationPolicy. Alternatively, you can delete Streams Replication Manager from the cluster.