Prerequisites to start replication job
Ensure the necessary requirements and configurations are in place before starting the replication job.
You can start the replication job by submitting the shaded JAR to a Flink cluster using the standard flink run command. All configuration must be passed as --key value command-line arguments.
Before starting the replication job, ensure the following requirements exist:
- A running Apache Flink cluster in session or per-job mode.
- A source Kudu cluster that contains the table you intend to replicate.
- A sink Kudu cluster where the destination table is already created, or where
job.createTable=trueis set to enable the job to create the table automatically. - A shared filesystem, such as HDFS or S3, that is accessible by all Flink TaskManagers for storing Flink checkpoints.
- The replication job JAR file (
kudu-replication-<version>.jar). This is a shaded JAR file that includes all Kudu dependencies. The Flink cluster provides the required Flink APIs at runtime.
