This Processor puts the contents of a FlowFile to a Topic in Apache Kafka using KafkaProducer API available with Kafka 2.6 API. The contents of the incoming FlowFile will be read using the configured Record Reader. Each record will then be serialized using the configured Record Writer, and this serialized form will be the content of a Kafka message. This message is optionally assigned a key by using the <Kafka Key> Property.
The Security Protocol property allows the user to specify the protocol for communicating with the Kafka broker. The following sections describe each of the protocols in further detail.
This option provides an unsecured connection to the broker, with no client authentication and no encryption. In order to use this option the broker must be configured with a listener of the form:
PLAINTEXT://host.name:port
This option provides an encrypted connection to the broker, with optional client authentication. In order to use this option the broker must be configured with a listener of the form:
SSL://host.name:portIn addition, the processor must have an SSL Context Service selected.
If the broker specifies ssl.client.auth=none, or does not specify ssl.client.auth, then the client will not be required to present a certificate. In this case, the SSL Context Service selected may specify only a truststore containing the public key of the certificate authority used to sign the broker's key.
If the broker specifies ssl.client.auth=required then the client will be required to present a certificate. In this case, the SSL Context Service must also specify a keystore containing a client key, in addition to a truststore as described above.
This option uses SASL with a PLAINTEXT transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:
SASL_PLAINTEXT://host.name:portIn addition, the Kerberos Service Name must be specified in the processor.
If the SASL mechanism is GSSAPI, then the client must provide a JAAS configuration to authenticate. The JAAS configuration can be provided by specifying the java.security.auth.login.config system property in NiFi's bootstrap.conf, such as:
java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
An example of the JAAS config file would be the following:
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab="/path/to/nifi.keytab" serviceName="kafka" principal="nifi@YOURREALM.COM"; };NOTE: The serviceName in the JAAS file must match the Kerberos Service Name in the processor.
Alternatively, the JAAS configuration when using GSSAPI can be provided by specifying the Kerberos Principal and Kerberos Keytab directly in the processor properties. This will dynamically create a JAAS configuration like above, and will take precedence over the java.security.auth.login.config system property.
If the SASL mechanism is PLAIN, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka's PlainLoginModule. An example of the JAAS config file would be the following:
KafkaClient { org.apache.kafka.common.security.plain.PlainLoginModule required username="nifi" password="nifi-password"; };The JAAS configuration can be provided by either of below ways
java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
sasl.jaas.config : org.apache.kafka.common.security.plain.PlainLoginModule required username="nifi" password="nifi-password";NOTE: The dynamic properties of this processor are not secured and as a result the password entered when utilizing sasl.jaas.config will be stored in the flow.json.gz file in plain-text, and will be saved to NiFi Registry if using versioned flows.
NOTE: It is not recommended to use a SASL mechanism of PLAIN with SASL_PLAINTEXT, as it would transmit the username and password unencrypted.
NOTE: The Kerberos Service Name is not required for SASL mechanism of PLAIN. However, processor warns saying this attribute has to be filled with non empty string. You can choose to fill any random string, such as "null".
NOTE: Using the PlainLoginModule will cause it be registered in the JVM's static list of Providers, making it visible to components in other NARs that may access the providers. There is currently a known issue where Kafka processors using the PlainLoginModule will cause HDFS processors with Keberos to no longer work.
If the SASL mechanism is SSL, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka's ScramLoginModule. Ensure that you add user defined attribute 'sasl.mechanism' and assign 'SCRAM-SHA-256' or 'SCRAM-SHA-512' based on kafka broker configurations. An example of the JAAS config file would be the following:
KafkaClient { org.apache.kafka.common.security.scram.ScramLoginModule username="nifi" password="nifi-password"; };The JAAS configuration can be provided by either of below ways
java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
sasl.jaas.config : org.apache.kafka.common.security.scram.ScramLoginModule required username="nifi" password="nifi-password";NOTE: The dynamic properties of this processor are not secured and as a result the password entered when utilizing sasl.jaas.config will be stored in the flow.json.gz file in plain-text, and will be saved to NiFi Registry if using versioned flows.
This option uses SASL with an SSL/TLS transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:
SASL_SSL://host.name:port
See the SASL_PLAINTEXT section for a description of how to provide the proper JAAS configuration depending on the SASL mechanism (GSSAPI or PLAIN).
See the SSL section for a description of how to configure the SSL Context Service based on the ssl.client.auth property.
This processor includes optional properties that control how a Kafka Record's key and headers are determined:
'Publish Strategy' controls the mode used to convert the FlowFile record into a Kafka record.
If Publish Strategy is set to 'Use Wrapper', two additional processor configuration properties are made available: 'Record Key Writer' and 'Record Metadata Strategy'.
The 'Record Key Writer' property determines the Record Writer that should be used to serialize the Kafka record's key. This may be used to emit the key as JSON, Avro, XML, or some other data format. If this property is not set, and the NiFi Record indicates that the key itself is a Record, the FlowFile will be routed to the 'failure' relationship. If this property is not set and the NiFi Record has a Byte Array or a String (encoded in UTF-8 format), the Kafka record's key will still be set accordingly.
The 'Record Metadata Strategy' specifies whether the Kafka Topic and partition should come from the configured 'Topic Name' property and 'Partition' / 'Partitioner class' properties,
or if they should come from the Record's optional metadata
field. If the value is set to 'Metadata From Record', the incoming FlowFile record is expected to have a field
named 'metadata'. That field is expected to be a Record with a 'topic' and a 'partition' field. If these fields are missing or invalid, the processor's 'Topic Name' and 'Partition' /
'Partitioner class' properties will still be used.
Using the metadata
field to convey the topic and partition has two advantages. Firstly, it pairs well with the ConsumeKafkaRecord_* processor, which produces
this same schema. This means that if data is consumed from one topic and pushed to another topic (or Kafka cluster), the data can be easily pinned to the same partition and topic name.
If the data should be pushed to a different topic, it can be easily updated using an UpdateRecord processor, for instance.
Additionally, because a single FlowFile can be sent as a single Kafka transaction, this allows sending records to multiple Kafka topics in a single transaction.
The below examples illustrate what will be sent to Kafka, given different configurations and FlowFile contents. These examples all assume that JsonRecordSetWriter and JsonTreeReader will be used for the Record Readers and Writers.
Given the processor configuration:
Processor Property | Configured Value |
---|---|
Message Key Field | account |
Attributes to Send as Headers (Regex) | attribute.* |
And a FlowFile with the content:
{"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
And attributes:
Attribute Name | Attribute Value |
---|---|
attributeA | valueA |
attributeB | valueB |
otherAttribute | otherValue |
The record that is produced to Kafka will have the following characteristics:
Record Key | {"name":"Acme","number":"AC1234"} |
||||||
---|---|---|---|---|---|---|---|
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||||
Record Headers |
|
When the Publish Strategy is configured to 'Use Wrapper', each FlowFile Record is expected to adhere to a specific schema.
The Record must have three fields: key
, value
, and headers
. There is a fourth, optional field named metadata
.
The key
may be a String, a byte array, or a Record. The value
can be any Record. The headers
is a Map where the values are Strings. The metadata
field is a Record that has two fields of interest: topic
and partition
. If these
fields are specified, they will take precedence over the configured 'Topic Name' and 'Partition' and 'Partitioner class' processor properties.
Given a FlowFile with the content:
{
"key": "Acme Holdings",
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"accountType": "enterprise",
"test": "true"
}
}
The record that is produced to Kafka will have the following characteristics:
Record Key | Acme Holdings |
||||||
---|---|---|---|---|---|---|---|
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||||
Record Headers |
|
Note that in this case, the headers and key come directly from the Record, not from FlowFile attributes.
If there is a desire to include some FlowFile attributes in the headers, this should be accomplished by using a Processor
upstream in order to inject those values into the headers
field. For example, an UpdateRecord processor could be used
to easily add new fields to the headers
Map.
Additionally, we may choose to use a more complex value for the record key. The key itself may be a record. This is sometimes used to write the record key either as JSON or as Avro. In this example, we assume that the 'Record Key Writer' property is set to a JsonRecordSetWriter.
Given a FlowFile with the content:
{
"key": {
"accountName": "Acme Holdings",
"accountHolder": "John Doe",
"accountId": "280182830-A009"
},
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
}
}
The record that is produced to Kafka will have the following characteristics:
Record Key | {"accountName":"Acme Holdings","accountHolder":"John Doe","accountId":"280182830-A009"} |
---|---|
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
Record Headers |
Note here that the Record Key is JSON, as the 'Record Key Writer' property is configured to write JSON. it could just as easily be Avro.
Also note that if the 'Record Key Writer' had not been set, the FlowFile would have been routed to the 'failure' relationship because the key is a Record.
Finally, note here that the headers
field is missing. This is acceptable and no headers will be added to the Kafka record.
We can also have a Record whose key
field is an array of bytes. In this case, the 'Record Key Writer' property is not used.
Given a FlowFile with the content:
{
"key": [65, 27, 10, 20, 11, 57, 88, 19, 65],
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"otherField": {
"a": "b"
}
}
The record that is produced to Kafka will have the following characteristics:
Record Key | 0x411b0a140b39581341 |
---|---|
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
Record Headers |
In this case, the byte array that is specified for the key is provided to the Kafka Record as a byte array without changes (in the table, it is simply represented as Hex).
Finally, note here that the headers
field is missing and an extraneous field, otherField
is present.
This is acceptable and no headers will be added to the Kafka record. The otherField
is simply ignored.
We can also have a Record whose key
field is null or missing. In this case, the 'Record Key Writer' property is not used.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b",
"c": {
"d": "e"
}
}
}
The record that is produced to Kafka will have the following characteristics:
Record Key | |||||||
---|---|---|---|---|---|---|---|
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||||
Record Headers |
|
In this case, the key is not present, so the Kafka record that is produced has no key associated with it.
Note also that the headers
field has the expected value for the a
header but the c
header has an expected value of MapRecord[{d=e}]
. This is because the headers
field is expected always to be a Map
with String values. By providing a Record for the c
element, we have violated the contract. NiFi attempts to compensate for this
by creating a String representation of the Record, even if it is unlikely to be the representation that the user expects.
If the Metadata field is provided in the FlowFile's Record, it will be used to determine the Topic and the Partition that the Records are written to.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b"
},
"metadata": {
"topic": "topic1"
}
}
And considering that the processor properties are configured as:
Property Name | Property Value |
---|---|
Topic Name | My Topic |
Partition | 2 |
Record Metadata Strategy | Metadata From Record |
The record that is produced to Kafka will have the following characteristics:
Kafka Topic | topic1 | ||||
---|---|---|---|---|---|
Topic Partition | 2 | ||||
Record Key | |||||
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||
Record Headers |
|
Note that the topic name comes directly from the FlowFile record, and the configured topic name ("My Topic") is ignored. However, if either the "metadata" field or its "topic" sub-field were missing, the configured topic name ("My Topic") would be used.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b"
},
"metadata": {
"partition": 6
}
}
And considering that the processor properties are configured as:
Property Name | Property Value |
---|---|
Topic Name | My Topic |
Partition | 2 |
Record Metadata Strategy | Metadata From Record |
The record that is produced to Kafka will have the following characteristics:
Kafka Topic | My Topic | ||||
---|---|---|---|---|---|
Topic Partition | 6 | ||||
Record Key | |||||
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||
Record Headers |
|
If the Metadata field is provided in the FlowFile's Record, it will be used to determine the Topic and the Partition that the Records are written to.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b"
},
"metadata": {
"topic": "topic1",
"partition": 0
}
}
And considering that the processor properties are configured as:
Property Name | Property Value |
---|---|
Topic Name | My Topic |
Partition | 2 |
Record Metadata Strategy | Metadata From Record |
The record that is produced to Kafka will have the following characteristics:
Kafka Topic | topic1 | ||||
---|---|---|---|---|---|
Topic Partition | 0 | ||||
Record Key | |||||
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||
Record Headers |
|
In this case, both the topic name and the partition are explicitly defined within the incoming Record, and those will be used.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b"
},
"metadata": "hello"
}
And considering that the processor properties are configured as:
Property Name | Property Value |
---|---|
Topic Name | My Topic |
Partition | 2 |
Record Metadata Strategy | Metadata From Record |
The record that is produced to Kafka will have the following characteristics:
Kafka Topic | My Topic | ||||
---|---|---|---|---|---|
Topic Partition | 2 | ||||
Record Key | |||||
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||
Record Headers |
|
In this case, the "metadata" field in the Record is ignored because it is not itself a Record.
Given a FlowFile with the content:
{
"value": {
"address": "1234 First Street",
"zip": "12345",
"account": {
"name": "Acme",
"number":"AC1234"
}
},
"headers": {
"a": "b"
},
"metadata": {
"topic": "topic1",
"partition": 6
}
}
And considering that the processor properties are configured as:
Property Name | Property Value |
---|---|
Topic Name | My Topic |
Partition | 2 |
Record Metadata Strategy | Use Configured Values |
The record that is produced to Kafka will have the following characteristics:
Kafka Topic | My Topic | ||||
---|---|---|---|---|---|
Topic Partition | 2 | ||||
Record Key | |||||
Record Value | {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}} |
||||
Record Headers |
|
In this case, the "metadata" field specifies both the topic and the partition. However, it is ignored in favor of the processor properties 'Topic' and 'Partition' because the property 'Record Metadata Strategy' is set to 'Use Configured Values'.