CaptureChangeDebeziumPostgreSQL

Description:

Uses Debezium to retrieve Change Data Capture (CDC) events from a PostgreSQL database. A FlowFile will contain events that are accumulated since the last run. If no new events are captured no FlowFile is created. All events are ordered by the time at which the operation occurred. PostgreSQL needs to be set up for CDC events to be available. Please refer to th Debezium documentation at https://debezium.io/documentation/reference/1.9/connectors/postgresql.html#setting-up-postgresql for further details. IMPORTANT: The current version of Debezium used by this Processor may have a restriction of what versions of PostgreSQL it is compatible with. Please refer to the Debezium documentation for further details at https://debezium.io/releases/1.9/

Tags:

debezium, database, change, cdc, postgresql

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Database History Cache Servicedb-history-cache-serviceController Service API:
DistributedMapCacheClient
Implementations: CouchbaseMapCacheClient
HBase_2_ClientMapCacheService
HazelcastMapCacheClient
DistributedMapCacheClientService
SimpleRedisDistributedMapCacheClientService
RedisDistributedMapCacheClientService
CassandraDistributedMapCache
Cache service to store database history used by Debezium.IMPORTANT: Debezium expects the content of this cache to remain consistent. For this reason only implementations that ensure the same content across all nodes should be used to make sure a Primary Node change doesn't cause any issues. The HazelcastMapCacheClient paired with EmbeddedHazelcastCacheManager with 'Hazelcast Clustering Strategy' set to 'All Nodes' can be used as an out-of-the-box solution.
Hostdb-hostHost name of the database server
Supports Expression Language: true (will be evaluated using Environment variables only)
Portdb-port5432Port of the database server
Supports Expression Language: true (will be evaluated using Environment variables only)
Usernamedb-usernameUsername to access the database server
Supports Expression Language: true (will be evaluated using Environment variables only)
Passworddb-passwordPassword to access the database server
Sensitive Property: true
Supports Expression Language: true (will be evaluated using Environment variables only)
Output Record Formatoutput-record-formatWhole
  • Whole The entire record provided by Debezium per event will be written to the FlowFile.
  • Payload Extracted The 'payload' section of the record provided by Debezium per event will be extracted and written to the FlowFile.
  • Payload Wrapped Only the 'payload' section of the record provided by Debezium per event, will be retained, other sections will be removed before writing it to the FlowFile.
The format of the record to write into FlowFiles.
Database Namedb-nameName of the database to connect to.
Supports Expression Language: true (will be evaluated using Environment variables only)
Schema Include Listdb-schema-include-listA comma-separated list of regular expressions that match schema names to be monitored. Must not be used with 'Schema Exclude List'.
Supports Expression Language: true (will be evaluated using Environment variables only)
Schema Exclude Listdb-schema-exclude-listA comma-separated list of regular expressions that match schema names to be excluded from monitoring. Must not be used with 'Schema Include List'.
Supports Expression Language: true (will be evaluated using Environment variables only)
Table Include Listdb-table-include-listA comma-separated list of regular expressions that match the fully-qualified names of tables to be monitored. Fully-qualified names for tables are of the form <schema_name>.<tableName>. Must not be used with 'Table Exclude List', and superseded by database inclusions/exclusions.
Supports Expression Language: true (will be evaluated using Environment variables only)
Table Exclude Listdb-table-exclude-listA comma-separated list of regular expressions that match the fully-qualified names of tables to be excluded from monitoring. Fully-qualified names for tables are of the form <schema_name>.<tableName>. Must not be used with 'Table Include List', and superseded by database inclusions/exclusions.
Supports Expression Language: true (will be evaluated using Environment variables only)
Column Include Listdb-column-include-listA comma-separated list of regular expressions that match the fully-qualified names of columns to include in change event record values. Fully-qualified names for columns are of the form <schema_name>.<table_name>.<column_name>.
Supports Expression Language: true (will be evaluated using Environment variables only)
Column Exclude Listdb-column-exclude-listA comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. Fully-qualified names for columns are of the form <schema_name>.<table_name>.<column_name>.
Supports Expression Language: true (will be evaluated using Environment variables only)

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
Additional Debezium config nameThe value for the additional Debezium config nameAdditional Debezium config can be provided. IMPORTANT: Debezium Json output format may be configured. Please refer to the Debezium documentation at https://debezium.io/documentation/reference/1.9/connectors.
Supports Expression Language: true (will be evaluated using Environment variables only)

Relationships:

NameDescription
successSuccessfully created FlowFile with Debezium change events.

Reads Attributes:

None specified.

Writes Attributes:

None specified.

State management:

ScopeDescription
CLUSTEREvent offset data and database schema history needs to be kept and stored between runs.

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship.

System Resource Considerations:

ResourceDescription
MEMORYA high volume of database changes may lead to large amount of memory consumption as the change events are asynchronously collected and held back until processed. Reducing the Scheduling Period may help to avoid this.