ConsumeKafka2CDP

Description:

Consumes messages from Apache Kafka specifically built against the Kafka 2.5.0.7.1.7.1000-141 Consumer API. The complementary NiFi processor for sending messages is PublishKafka2CDP.

Additional Details...

Tags:

Kafka, Get, Ingest, Ingress, Topic, PubSub, Consume, 2.5.0.7.1.7.1000-141

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Kafka Brokersbootstrap.serverslocalhost:9092Comma-separated list of Kafka Brokers in the format host:port
Supports Expression Language: true (will be evaluated using Environment variables only)
Topic Name(s)topicThe name of the Kafka Topic(s) to pull from. More than one can be supplied if comma separated.
Supports Expression Language: true (will be evaluated using Environment variables only)
Topic Name Formattopic_typenames
  • names Topic is a full topic name or comma separated list of names
  • pattern Topic is a regex using the Java Pattern syntax
Specifies whether the Topic(s) provided are a comma separated list of names or a single regular expression
Group IDgroup.idA Group ID is used to identify consumers that are within the same consumer group. Corresponds to Kafka's 'group.id' property.
Supports Expression Language: true (will be evaluated using Environment variables only)
Commit OffsetsCommit Offsetstrue
  • true
  • false
Specifies whether or not this Processor should commit the offsets to Kafka after receiving messages. Typically, we want this value set to true so that messages that are received are not duplicated. However, in certain scenarios, we may want to avoid committing the offsets, that the data can be processed and later acknowledged by PublishKafkaRecord in order to provide Exactly Once semantics. See Processor's Usage / Additional Details for more information.
Max Uncommitted Timemax-uncommit-offset-wait1 secsSpecifies the maximum amount of time allowed to pass before offsets must be committed. This value impacts how often offsets will be committed. Committing offsets less often increases throughput but also increases the window of potential data duplication in the event of a rebalance or JVM restart between commits. This value is also related to maximum poll records and the use of a message demarcator. When using a message demarcator we can have far more uncommitted messages than when we're not as there is much less for us to keep track of in memory.

This Property is only considered if the [Commit Offsets] Property has a value of "true".
Honor Transactionshonor-transactionstrue
  • true
  • false
Specifies whether or not NiFi should honor transactional guarantees when communicating with Kafka. If false, the Processor will use an "isolation level" of read_uncomitted. This means that messages will be received as soon as they are written to Kafka but will be pulled, even if the producer cancels the transactions. If this value is true, NiFi will not receive any messages for which the producer's transaction was canceled, but this can result in some latency since the consumer must wait for the producer to finish its entire transaction instead of pulling as the messages become available.
Message Demarcatormessage-demarcatorSince KafkaConsumer receives messages in batches, you have an option to output FlowFiles which contains all Kafka messages in a single batch for a given topic and partition and this property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple Kafka messages. This is an optional property and if not provided each Kafka message received will result in a single FlowFile which time it is triggered. To enter special character such as 'new line' use CTRL+Enter or Shift+Enter depending on the OS
Supports Expression Language: true (will be evaluated using Environment variables only)
Separate By Keyseparate-by-keyfalse
  • true
  • false
If true, and the <Message Demarcator> property is set, two messages will only be added to the same FlowFile if both of the Kafka Messages have identical keys.
Security Protocolsecurity.protocolPLAINTEXT
  • PLAINTEXT
  • SSL
  • SASL_PLAINTEXT
  • SASL_SSL
Security protocol used to communicate with brokers. Corresponds to Kafka Client security.protocol property
SASL Mechanismsasl.mechanismGSSAPI
  • GSSAPI General Security Services API for Kerberos authentication
  • PLAIN Plain username and password authentication
  • SCRAM-SHA-256 Salted Challenge Response Authentication Mechanism using SHA-512 with username and password
  • SCRAM-SHA-512 Salted Challenge Response Authentication Mechanism using SHA-256 with username and password
  • AWS_MSK_IAM Allows to use AWS IAM for authentication and authorization against Amazon MSK clusters that have AWS IAM enabled as an authentication mechanism. The IAM credentials will be found using the AWS Default Credentials Provider Chain.
SASL mechanism used for authentication. Corresponds to Kafka Client sasl.mechanism property
Kerberos User Servicekerberos-user-serviceController Service API:
SelfContainedKerberosUserService
Implementations: KerberosTicketCacheUserService
KerberosKeytabUserService
Service supporting user authentication with Kerberos
Kerberos Service Namesasl.kerberos.service.nameThe service name that matches the primary name of the Kafka server configured in the broker JAAS configuration
Supports Expression Language: true (will be evaluated using Environment variables only)
Usernamesasl.usernameUsername provided with configured password when using PLAIN or SCRAM SASL Mechanisms
Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [SASL Mechanism] Property is set to one of the following values: [PLAIN], [SCRAM-SHA-512], [SCRAM-SHA-256]
Passwordsasl.passwordPassword provided with configured username when using PLAIN or SCRAM SASL Mechanisms
Sensitive Property: true
Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [SASL Mechanism] Property is set to one of the following values: [PLAIN], [SCRAM-SHA-512], [SCRAM-SHA-256]
Token Authenticationsasl.token.authfalse
  • true
  • false
Enables or disables Token authentication when using SCRAM SASL Mechanisms

This Property is only considered if the [SASL Mechanism] Property is set to one of the following values: [SCRAM-SHA-512], [SCRAM-SHA-256]
AWS Profile Nameaws.profile.nameThe Amazon Web Services Profile to select when multiple profiles are available.
Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [SASL Mechanism] Property has a value of "AWS_MSK_IAM".
SSL Context Servicessl.context.serviceController Service API:
SSLContextService
Implementations: StandardRestrictedSSLContextService
StandardSSLContextService
Service supporting SSL communication with Kafka brokers
Key Attribute Encodingkey-attribute-encodingUTF-8 Encoded
  • UTF-8 Encoded The key is interpreted as a UTF-8 Encoded string.
  • Hex Encoded The key is interpreted as arbitrary binary data and is encoded using hexadecimal characters with uppercase letters
  • Do Not Add Key as Attribute The key will not be added as an Attribute
FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded.
Offset Resetauto.offset.resetlatest
  • earliest Automatically reset the offset to the earliest offset
  • latest Automatically reset the offset to the latest offset
  • none Throw exception to the consumer if no previous offset is found for the consumer's group
Allows you to manage the condition when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted). Corresponds to Kafka's 'auto.offset.reset' property.
Message Header Encodingmessage-header-encodingUTF-8Any message header that is found on a Kafka message will be added to the outbound FlowFile as an attribute. This property indicates the Character Encoding to use for deserializing the headers.
Headers to Add as Attributes (Regex)header-name-regexA Regular Expression that is matched against all message headers. Any message header whose name matches the regex will be added to the FlowFile as an Attribute. If not specified, no Header values will be added as FlowFile attributes. If two messages have a different value for the same header and that header is selected by the provided regex, then those two messages must be added to different FlowFiles. As a result, users should be cautious about using a regex like ".*" if messages are expected to have header values that are unique per message, such as an identifier or timestamp, because it will prevent NiFi from bundling the messages together efficiently.
Max Poll Recordsmax.poll.records10000Specifies the maximum number of records Kafka should return in a single poll.
Communications TimeoutCommunications Timeout60 secsSpecifies the timeout that the consumer should use when communicating with the Kafka Broker
interceptor.classesinterceptor.classescom.hortonworks.smm.kafka.monitoring.interceptors.MonitoringConsumerInterceptorSpecifies the value for 'interceptor.classes' Kafka Configuration.
Supports Expression Language: true (will be evaluated using Environment variables only)

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
The name of a Kafka configuration property.The value of a given Kafka configuration property.These properties will be added on the Kafka configuration after loading any provided configuration properties. In the event a dynamic property represents a property that was already set, its value will be ignored and WARN message logged. For the list of available Kafka properties please refer to: http://kafka.apache.org/documentation.html#configuration.
Supports Expression Language: true (will be evaluated using Environment variables only)

Relationships:

NameDescription
successFlowFiles received from Kafka. Depending on demarcation strategy it is a flow file per message or a bundle of messages grouped by topic and partition.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
kafka.countThe number of messages written if more than one
kafka.keyThe key of message if present and if single message. How the key is encoded depends on the value of the 'Key Attribute Encoding' property.
kafka.offsetThe offset of the message in the partition of the topic.
kafka.timestampThe timestamp of the message in the partition of the topic.
kafka.partitionThe partition of the topic the message or message bundle is from
kafka.topicThe topic the message or message bundle is from
kafka.tombstoneSet to true if the consumed message is a tombstone message

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship.

System Resource Considerations:

None specified.