Kudu replication configuration reference
Use the following parameters to configure the job, reader, and writer settings for Kudu replication.
Job Parameters
The following table describes the parameters that control the overall replication behavior:
| Parameter | Required | Default | Description |
|---|---|---|---|
| job.sourceMasterAddresses | Yes | N/A | The comma-separated list of source Kudu master addresses, such as
host1:7051,host2:7051. |
| job.sinkMasterAddresses | Yes | N/A | The comma-separated list of sink Kudu master addresses. |
| job.tableName | Yes | N/A | The name of the source Kudu table to replicate. |
| job.checkpointsDirectory | Yes | N/A | The filesystem path where Flink stores checkpoint data, such as an HDFS path. |
| job.discoveryIntervalSeconds | No | 600 | The frequency in seconds that the job polls for new changes by using a diff scan. |
| job.checkpointingIntervalMillis | No | 60000 | The interval in milliseconds at which Flink takes checkpoints. This value must
be strictly less than the result of job.discoveryIntervalSeconds *
1000. |
| job.createTable | No | false | If set to true, the job automatically creates the sink table
if it does not exist. |
| job.tableSuffix | No | "" | The suffix appended to the sink table name. This is useful for testing replication to a table with a different name on the same cluster. |
| job.restoreOwner | No | false | If set to true and job.createTable=true, the
job copies the table owner from the source to the sink table. |
Reader Parameters
The following table describes the parameters that control how the job reads data from the source Kudu cluster:
| Parameter | Required | Default | Description |
|---|---|---|---|
| reader.batchSizeBytes | No | 20971520 | The maximum number of bytes fetched in a single scan batch (default is 20 MiB). |
| reader.splitSizeBytes | No | Kudu default | The target size in bytes for each scan split when parallelizing input. |
| reader.scanRequestTimeout | No | 30000 ms | The timeout in milliseconds for individual scan RPCs. |
| reader.prefetching | No | false | Whether to enable prefetching of data blocks from the scanner. |
| reader.keepAlivePeriodMs | No | 15000 ms | The period in milliseconds after which an idle scanner sends a keep-alive message to the server. |
| reader.replicaSelection | No | CLOSEST_REPLICA | The replica selection strategy. Valid values : CLOSEST_REPLICA or LEADER_ONLY. |
Writer Parameters
The following table describes the parameters that control how the job writes data to the sink Kudu cluster:
| Parameter | Required | Default | Description |
|---|---|---|---|
| writer.flushMode | No | AUTO_FLUSH_BACKGROUND | The Kudu session flush mode. Valid values : AUTO_FLUSH_BACKGROUND, AUTO_FLUSH_SYNC, or
MANUAL_FLUSH. |
| writer.operationTimeout | No | 30000 ms | The timeout in milliseconds for individual write operations. |
| writer.maxBufferSize | No | 1000 | The maximum number of operations buffered in the Kudu write session. |
| writer.flushInterval | No | 1000 ms | The interval in milliseconds at which the job automatically flushes buffered operations. |
