Uses Debezium to retrieve Change Data Capture (CDC) events from a PostgreSQL database. A FlowFile will contain events that are accumulated since the last run. If no new events are captured no FlowFile is created. All events are ordered by the time at which the operation occurred. PostgreSQL needs to be set up for CDC events to be available. Please refer to th Debezium documentation at https://debezium.io/documentation/reference/1.9/connectors/postgresql.html#setting-up-postgresql for further details. IMPORTANT: The current version of Debezium used by this Processor may have a restriction of what versions of PostgreSQL it is compatible with. Please refer to the Debezium documentation for further details at https://debezium.io/releases/1.9/
debezium, database, change, cdc, postgresql
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Database History Cache Service | db-history-cache-service | Controller Service API: DistributedMapCacheClient Implementations: SimpleRedisDistributedMapCacheClientService HBase_2_ClientMapCacheService HazelcastMapCacheClient DistributedMapCacheClientService RedisDistributedMapCacheClientService CouchbaseMapCacheClient CassandraDistributedMapCache HBase_1_1_2_ClientMapCacheService | Cache service to store database history used by Debezium.IMPORTANT: Debezium expects the content of this cache to remain consistent. For this reason only implementations that ensure the same content across all nodes should be used to make sure a Primary Node change doesn't cause any issues. The cache also must be persistent in order to support NiFi and cache provider restarts. | |
Host | db-host | Host name of the database server Supports Expression Language: true (will be evaluated using variable registry only) | ||
Port | db-port | 5432 | Port of the database server Supports Expression Language: true (will be evaluated using variable registry only) | |
Username | db-username | Username to access the database server Supports Expression Language: true (will be evaluated using variable registry only) | ||
Password | db-password | Password to access the database server Sensitive Property: true Supports Expression Language: true (will be evaluated using variable registry only) | ||
Output Record Format | output-record-format | Whole |
| The format of the record to write into FlowFiles. |
Database Name | db-name | Name of the database to connect to. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Schema Include List | db-schema-include-list | A comma-separated list of regular expressions that match schema names to be monitored. Must not be used with 'Schema Exclude List'. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Schema Exclude List | db-schema-exclude-list | A comma-separated list of regular expressions that match schema names to be excluded from monitoring. Must not be used with 'Schema Include List'. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Table Include List | db-table-include-list | A comma-separated list of regular expressions that match the fully-qualified names of tables to be monitored. Fully-qualified names for tables are of the form <schema_name>.<tableName>. Must not be used with 'Table Exclude List', and superseded by database inclusions/exclusions. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Table Exclude List | db-table-exclude-list | A comma-separated list of regular expressions that match the fully-qualified names of tables to be excluded from monitoring. Fully-qualified names for tables are of the form <schema_name>.<tableName>. Must not be used with 'Table Include List', and superseded by database inclusions/exclusions. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Column Include List | db-column-include-list | A comma-separated list of regular expressions that match the fully-qualified names of columns to include in change event record values. Fully-qualified names for columns are of the form <schema_name>.<table_name>.<column_name>. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Column Exclude List | db-column-exclude-list | A comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. Fully-qualified names for columns are of the form <schema_name>.<table_name>.<column_name>. Supports Expression Language: true (will be evaluated using variable registry only) |
Supports Sensitive Dynamic Properties: No
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
---|---|---|
Additional Debezium config name | The value for the additional Debezium config name | Additional Debezium config can be provided. IMPORTANT: Debezium Json output format may be configured. Please refer to the Debezium documentation at https://debezium.io/documentation/reference/1.9/connectors. Supports Expression Language: true (will be evaluated using variable registry only) |
Name | Description |
---|---|
success | Successfully created FlowFile with Debezium change events. |
Scope | Description |
---|---|
CLUSTER | Event offset data and database schema history needs to be kept and stored between runs. |
Resource | Description |
---|---|
MEMORY | A high volume of database changes may lead to large amount of memory consumption as the change events are asynchronously collected and held back until processed. Reducing the Scheduling Period may help to avoid this. |