Iceberg is a high-performance format for huge analytic tables.
The PutIcebergCDC processor is capable of applying CDC (Change Data Capture) operations on Iceberg tables using Hive Iceberg catalog.
Note: The processor requires Iceberg 2.0 tables since the solution depends on equality delete files availability.
PutIcebergCDC processor accepts CDC records as input.
The "Operation RecordPath" field specifies the operation. "Before Data RecordPath" and "After Data RecordPath" fields are specified as well to mark the status of the record before and after the operation.
{
"schema": {...},
"payload": {
"before": null,
"after": {
"id": 1,
"first_name": "Anne",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"source": { ...},
"op": "c",
"ts_ms": 1559033904863
}
}
For more information about Debezium records, please check the following documentation:
{
"table": "ORCL.ESHOP.CUSTOMER_ORDER_ITEM",
"op_type": "I",
"op_ts": "2019-05-31 04:24:34.000327",
"current_ts": "2019-05-31 04:24:39.650000",
"pos": "00000000020000004074",
"primary_keys": [
"ID"
],
"tokens": {
"txid": "9.32.6726",
"csn": "13906131"
},
"before": null,
"after": {
"ID": 11,
"ID_CUSTOMER_ORDER": 11,
"DESCRIPTION": "Cars 3",
"QUANTITY": 2
}
}
For more information about GoldenGate records, please check the following documentation:
Iceberg supports multiple concurrent writes using optimistic concurrency. The processor's commit retry implementation is using exponential backoff with jitter and scale factor 2, and provides the following properties to configure the behaviour according to its usage.