CaptureChangeMySQL

Description:

Retrieves Change Data Capture (CDC) events from a MySQL database. CDC Events include INSERT, UPDATE, DELETE operations. Events are output as either a group of a specified number of events (the default is 1 so each event becomes its own flow file) or grouped as a full transaction (BEGIN to COMMIT). All events are ordered by the time at which the operation occurred. NOTE: If the processor is stopped before the specified number of events have been written to a flow file, the partial flow file will be output in order to maintain the consistency of the event stream.

Tags:

sql, jdbc, cdc, mysql, transaction, event

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
MySQL Nodescapture-change-mysql-hostsA list of hostname (and optional port) entries corresponding to nodes in a MySQL cluster. The entries should be comma separated using a colon (if the port is to be specified) such as host1:port,host2:port,.... For example mysql.myhost.com:3306. The port need not be specified, when omitted the default MySQL port value of 3306 will be used. This processor will attempt to connect to the hosts in the list in order. If one node goes down and failover is enabled for the cluster, then the processor will connect to the active node (assuming its node entry is specified in this property).
Supports Expression Language: true (will be evaluated using Environment variables only)
MySQL Driver Class Namecapture-change-mysql-driver-classcom.mysql.jdbc.DriverThe class name of the MySQL database driver class
Supports Expression Language: true (will be evaluated using Environment variables only)
MySQL Driver Location(s)capture-change-mysql-driver-locationsComma-separated list of files/folders and/or URLs containing the MySQL driver JAR and its dependencies (if any). For example '/var/tmp/mysql-connector-java-5.1.38-bin.jar'

This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: directory, file, URL.

Supports Expression Language: true (will be evaluated using Environment variables only)
Usernamecapture-change-mysql-usernameUsername to access the MySQL cluster
Supports Expression Language: true (will be evaluated using Environment variables only)
Passwordcapture-change-mysql-passwordPassword to access the MySQL cluster
Sensitive Property: true
Supports Expression Language: true (will be evaluated using Environment variables only)
Event Processing Strategyevents-per-flowfile-strategyMax Events Per FlowFile
  • Max Events Per FlowFile This strategy causes at most the number of events specified in the 'Number of Events Per FlowFile' property to be written per FlowFile. If the processor is stopped before the specified number of events has been written (or the event queue becomes empty), the fewer number of events will still be written as a FlowFile before stopping.
  • One Transaction Per FlowFile This strategy causes each event from a transaction (from BEGIN to COMMIT) to be written to a FlowFile
Specifies the strategy to use when writing events to FlowFile(s), such as 'Max Events Per FlowFile'
Events Per FlowFilenumber-of-events-per-flowfile1Specifies how many events should be written to a single FlowFile. If the processor is stopped before the specified number of events has been written,the events will still be written as a FlowFile before stopping.
Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [Event Processing Strategy] Property has a value of "Max Events Per FlowFile".
Server IDcapture-change-mysql-server-idThe client connecting to the MySQL replication group is actually a simplified replica (server), and the Server ID value must be unique across the whole replication group (i.e. different from any other Server ID being used by any primary or replica). Thus, each instance of CaptureChangeMySQL must have a Server ID unique across the replication group. If the Server ID is not specified, it defaults to 65535.
Supports Expression Language: true (will be evaluated using Environment variables only)
Database/Schema Name Patterncapture-change-mysql-db-name-patternA regular expression (regex) for matching databases (or schemas, depending on your RDBMS' terminology) against the list of CDC events. The regex must match the database name as it is stored in the RDBMS. If the property is not set, the database name will not be used to filter the CDC events. NOTE: DDL events, even if they affect different databases, are associated with the database used by the session to execute the DDL. This means if a connection is made to one database, but the DDL is issued against another, then the connected database will be the one matched against the specified pattern.
Table Name Patterncapture-change-mysql-name-patternA regular expression (regex) for matching CDC events affecting matching tables. The regex must match the table name as it is stored in the database. If the property is not set, no events will be filtered based on table name.
Max Wait Timecapture-change-mysql-max-wait-time30 secondsThe maximum amount of time allowed for a connection to be established, zero means there is effectively no limit.
Supports Expression Language: true (will be evaluated using Environment variables only)
Distributed Map Cache Client - unusedcapture-change-mysql-dist-map-cache-clientController Service API:
DistributedMapCacheClient
Implementations: HBase_2_ClientMapCacheService
CouchbaseMapCacheClient
DistributedMapCacheClientService
SimpleRedisDistributedMapCacheClientService
RedisDistributedMapCacheClientService
CassandraDistributedMapCache
HazelcastMapCacheClient
This is a legacy property that is no longer used to store table information, the processor will handle the table information (column names, types, etc.)
Retrieve All Recordscapture-change-mysql-retrieve-all-recordstrue
  • true
  • false
Specifies whether to get all available CDC events, regardless of the current binlog filename and/or position. If binlog filename and position values are present in the processor's State, this property's value is ignored. This allows for 4 different configurations: 1) If binlog data is available in processor State, that is used to determine the start location and the value of Retrieve All Records is ignored. 2) If no binlog data is in processor State, then Retrieve All Records set to true means start at the beginning of the binlog history. 3) If no binlog data is in processor State and Initial Binlog Filename/Position are not set, then Retrieve All Records set to false means start at the end of the binlog history. 4) If no binlog data is in processor State and Initial Binlog Filename/Position are set, then Retrieve All Records set to false means start at the specified initial binlog file/position. To reset the behavior, clear the processor state (refer to the State Management section of the processor's documentation).
Include Begin/Commit Eventscapture-change-mysql-include-begin-commitfalse
  • true
  • false
Specifies whether to emit events corresponding to a BEGIN or COMMIT event in the binary log. Set to true if the BEGIN/COMMIT events are necessary in the downstream flow, otherwise set to false, which suppresses generation of these events and can increase flow performance.
Include DDL Eventscapture-change-mysql-include-ddl-eventsfalse
  • true
  • false
Specifies whether to emit events corresponding to Data Definition Language (DDL) events such as ALTER TABLE, TRUNCATE TABLE, e.g. in the binary log. Set to true if the DDL events are desired/necessary in the downstream flow, otherwise set to false, which suppresses generation of these events and can increase flow performance.
State Update Intervalcapture-change-mysql-state-update-interval0 secondsDEPRECATED. This property is no longer used and exists solely for backward compatibility purposes. Indicates how often to update the processor's state with binlog file/position values. A value of zero means that state will only be updated when the processor is stopped or shutdown. If at some point the processor state does not contain the desired binlog values, the last flow file emitted will contain the last observed values, and the processor can be returned to that state by using the Initial Binlog File, Initial Binlog Position, and Initial Sequence ID properties.
Supports Expression Language: true (will be evaluated using Environment variables only)
Initial Sequence IDcapture-change-mysql-init-seq-idSpecifies an initial sequence identifier to use if this processor's State does not have a current sequence identifier. If a sequence identifier is present in the processor's State, this property is ignored. Sequence identifiers are monotonically increasing integers that record the order of flow files generated by the processor. They can be used with the EnforceOrder processor to guarantee ordered delivery of CDC events.
Supports Expression Language: true (will be evaluated using Environment variables only)
Initial Binlog Filenamecapture-change-mysql-init-binlog-filenameSpecifies an initial binlog filename to use if this processor's State does not have a current binlog filename. If a filename is present in the processor's State or "Use GTID" property is set to false, this property is ignored. This can be used along with Initial Binlog Position to "skip ahead" if previous events are not desired. Note that NiFi Expression Language is supported, but this property is evaluated when the processor is configured, so FlowFile attributes may not be used. Expression Language is supported to enable the use of the environment properties.
Supports Expression Language: true (will be evaluated using Environment variables only)
Initial Binlog Positioncapture-change-mysql-init-binlog-positionSpecifies an initial offset into a binlog (specified by Initial Binlog Filename) to use if this processor's State does not have a current binlog filename. If a filename is present in the processor's State or "Use GTID" property is false, this property is ignored. This can be used along with Initial Binlog Filename to "skip ahead" if previous events are not desired. Note that NiFi Expression Language is supported, but this property is evaluated when the processor is configured, so FlowFile attributes may not be used. Expression Language is supported to enable the use of the environment properties.
Supports Expression Language: true (will be evaluated using Environment variables only)
Use Binlog GTIDcapture-change-mysql-use-gtidfalse
  • true
  • false
Specifies whether to use Global Transaction ID (GTID) for binlog tracking. If set to true, processor's state of binlog file name and position is ignored. The main benefit of using GTID is to have much reliable failover than using binlog filename/position.
Initial Binlog GTIDcapture-change-mysql-init-gtidSpecifies an initial GTID to use if this processor's State does not have a current GTID. If a GTID is present in the processor's State or "Use GTID" property is set to false, this property is ignored. This can be used to "skip ahead" if previous events are not desired. Note that NiFi Expression Language is supported, but this property is evaluated when the processor is configured, so FlowFile attributes may not be used. Expression Language is supported to enable the use of the environment properties.
Supports Expression Language: true (will be evaluated using Environment variables only)
SSL ModeSSL ModeDISABLED
  • DISABLED Connect without TLS
  • PREFERRED Connect with TLS when server support enabled, otherwise connect without TLS
  • REQUIRED Connect with TLS or fail when server support not enabled
  • VERIFY_IDENTITY Connect with TLS or fail when server support not enabled. Verify server hostname matches presented X.509 certificate names or fail when not matched
SSL Mode used when SSL Context Service configured supporting certificate verification options
SSL Context ServiceSSL Context ServiceController Service API:
SSLContextService
Implementations: StandardRestrictedSSLContextService
StandardSSLContextService
SSL Context Service supporting encrypted socket communication

This Property is only considered if the [SSL Mode] Property is set to one of the following values: [PREFERRED], [VERIFY_IDENTITY], [REQUIRED]

Relationships:

NameDescription
successSuccessfully created FlowFile from SQL query result set.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
cdc.sequence.idA sequence identifier (i.e. strictly increasing integer value) specifying the order of the CDC event flow file relative to the other event flow file(s).
cdc.event.typeA string indicating the type of CDC event that occurred, including (but not limited to) 'begin', 'insert', 'update', 'delete', 'ddl' and 'commit'.
mime.typeThe processor outputs flow file content in JSON format, and sets the mime.type attribute to application/json

State management:

ScopeDescription
CLUSTERInformation such as a 'pointer' to the current CDC event in the database is stored by this processor, such that it can continue from the same location if restarted.

Restricted:

Required PermissionExplanation
reference remote resourcesDatabase Driver Location can reference resources over HTTP

Input requirement:

This component does not allow an incoming relationship.

System Resource Considerations:

None specified.