Kudu replication configuration reference

Use the following parameters to configure the job, reader, and writer settings for Kudu replication.

Job Parameters

The following table describes the parameters that control the overall replication behavior:

Parameter Required Default Description
job.sourceMasterAddresses Yes N/A The comma-separated list of source Kudu master addresses, such as host1:7051,host2:7051.
job.sinkMasterAddresses Yes N/A The comma-separated list of sink Kudu master addresses.
job.tableName Yes N/A The name of the source Kudu table to replicate.
job.checkpointsDirectory Yes N/A The filesystem path where Flink stores checkpoint data, such as an HDFS path.
job.discoveryIntervalSeconds No 600 The frequency in seconds that the job polls for new changes by using a diff scan.
job.checkpointingIntervalMillis No 60000 The interval in milliseconds at which Flink takes checkpoints. This value must be strictly less than the result of job.discoveryIntervalSeconds * 1000.
job.createTable No false If set to true, the job automatically creates the sink table if it does not exist.
job.tableSuffix No "" The suffix appended to the sink table name. This is useful for testing replication to a table with a different name on the same cluster.
job.restoreOwner No false If set to true and job.createTable=true, the job copies the table owner from the source to the sink table.

Reader Parameters

The following table describes the parameters that control how the job reads data from the source Kudu cluster:

Parameter Required Default Description
reader.batchSizeBytes No 20971520 The maximum number of bytes fetched in a single scan batch (default is 20 MiB).
reader.splitSizeBytes No Kudu default The target size in bytes for each scan split when parallelizing input.
reader.scanRequestTimeout No 30000 ms The timeout in milliseconds for individual scan RPCs.
reader.prefetching No false Whether to enable prefetching of data blocks from the scanner.
reader.keepAlivePeriodMs No 15000 ms The period in milliseconds after which an idle scanner sends a keep-alive message to the server.
reader.replicaSelection No CLOSEST_REPLICA The replica selection strategy.

Valid values :

CLOSEST_REPLICA or LEADER_ONLY.

Writer Parameters

The following table describes the parameters that control how the job writes data to the sink Kudu cluster:

Parameter Required Default Description
writer.flushMode No AUTO_FLUSH_BACKGROUND The Kudu session flush mode.

Valid values :

AUTO_FLUSH_BACKGROUND, AUTO_FLUSH_SYNC, or MANUAL_FLUSH.
writer.operationTimeout No 30000 ms The timeout in milliseconds for individual write operations.
writer.maxBufferSize No 1000 The maximum number of operations buffered in the Kudu write session.
writer.flushInterval No 1000 ms The interval in milliseconds at which the job automatically flushes buffered operations.