HDFS Stateless Sink properties reference
Review the following reference for a comprehensive list of the connector properties that are specific to the HDFS Stateless Sink connector.
parameter.[***CONNECTOR NAME***] Parameters:
In addition to the properties listed here, this connector also accepts certain properties of the Kafka Connect framework as well as the properties of the NiFi Stateless Sink connector. When creating a new connector using the SMM UI, all valid properties are presented in the default configuration template. You can view the configuration template to get a full list of valid properties. In addition, for more information regarding the accepted properties not listed here, you can review the Apache Kafka documentation and the Stateless NiFi Sink properties reference.
Avro Schema Write Strategy
- Description
- Specifies how the record schema is attached to the output data file. Applicable only
for Avro output (
Output File Data Format
is set toAvro
).- Do Not Write Schema
- Neither the schema nor reference to the schema is attached to the output Avro messages.
- Embed Avro Schema
- The schema is embedded in every output Avro message.
- HWX Content-Encoded Schema Reference
- A reference to the schema (identified by Schema Name) within Schema Registry is encoded in the content of the outgoing Avro messages.
- Default Value
- Embed Avro Schema
- Accepted Values
- Embed Avro Schema, Do Not Write Schema, HWX Content-Encoded Schema Reference
- Required
- false
Compression Codec
- Description
- The codec used for file compression in HDFS. Use this property to set the codec if the output file format is JSON, Avro, or CSV.
- Default Value
- NONE
- Accepted Values
- NONE, DEFAULT, BZIP, GZIP, LZ4, LZO, SNAPPY, AUTOMATIC
- Required
- true
Compression Codec for Parquet
- Description
- Codec used for file compression in HDFS. Use this property to set the codec if the output file format is Parquet.
- Default Value
- UNCOMPRESSED
- Accepted Values
- UNCOMPRESSED,SNAPPY, GZIP, LZO
- Required
- true
Date Format
- Description
- Specifies the format to use when writing date fields to JSON or CSV.
- Default Value
- yyyy-MM-dd
- Accepted Values
- Required
- true
Hadoop Configuration Resources
- Description
- A comma separated list of files which contains the Hadoop file system configuration.
- Default Value
- /etc/hadoop/conf/core-site.xml, /etc/hadoop/conf/hdfs-site.xml
- Accepted Values
- Required
- false
Kafka Message Data Format
- Description
- Specifies the format of the messages the connector receives from Kafka. If set to
Avro
orJSON
, record processing is enabled.Raw
can be used for unstructured text or binary data. - Default Value
- Avro
- Accepted Values
- Avro, JSON, Raw
- Required
- true
Kerberos Keytab for HDFS
- Description
- The fully-qualified filename of the Kerberos keytab associated with the principal for accessing HDFS.
- Default Value
- The location of the default keytab which is empty and can only be used for unsecure connections.
- Accepted Values
- Required
- true
Kerberos Keytab for Schema Registry
- Description
- The fully-qualified filename of the Kerberos keytab associated with the principal for accessing Schema Registry.
- Default Value
- The location of the default keytab which is empty and can only be used for unsecure connections.
- Accepted Values
- Required
- true
Kerberos Principal for HDFS
- Description
- The Kerberos principal used for authenticating to HDFS.
- Default Value
- default
- Accepted Values
- Required
- true
Kerberos Principal for Schema Registry
- Description
- The Kerberos principal used for authenticating to Schema Registry.
- Default Value
- default
- Accepted Values
- Required
- true
Maximum File Size
- Description
- The maximum size of the output data file. No size limit is applied if this property is not specified. Example values: 100 MB, 1 GB.
- Default Value
- Accepted Values
- Required
- false
Maximum Number of Entries
- Description
- The maximum number of entries in the output data file. In the context of this
property, entry can mean one of two things. If record processing is enabled
(
Output File Data Format
is set toAvro
orJSON
), an entry is a record. Otherwise, entry means a Kafka message. Set this property to 1 if you want to create a separate output file for each Kafka message. - Default Value
- 1000000
- Accepted Values
- Required
- true
Output Directory Pattern
- Description
- Specifies the full path of output HDFS directory. The pattern can contain string
literals (fixed text) as well as the
${directory.timestamp}
expression, which inserts the current timestamp in the directory name. - Default Value
- Accepted Values
- Required
- true
Output Directory Timestamp Format
- Description
- The timestamp format to use for the
${directory.timestamp}
expression. For example:yyyyMMdd
. - Default Value
- Accepted Values
- Required
- false
Output File Data Format
- Description
- Specifies the format of the records written to the output file. Required when record
processing is enabled (
Kafka Message Data Format
is set toAvro
orJSON
). - Default Value
- Avro
- Accepted Values
- Avro, JSON, CSV, Parquet
- Required
- false
Output File Demarcator
- Description
- Specifies the character sequence for demarcating (delimiting) message boundaries when
multiple Kafka messages are ingested into an output file as raw messages (no record
processing). This property can only be used if
Kafka Message Data Format
is set toRaw
. If you want to use newline as the delimiter, set this property to\n
. - Default Value
- Accepted Values
- Required
- false
Output Filename Pattern
- Description
- Specifies the structure of the name of the output file. The pattern can contain string
literal (fixed text) parts and one or more of the following expressions:
${filename.uuid}
: Inserts a generated UUID in the filename.${filename.timestamp}
: Inserts the current timestamp in the filename.${filename.sequence}
: Inserts an incrementing sequence value in the filename.
In order to generate unique filenames, either
${filename.uuid}
or${filename.sequence}
must be used in the pattern.Examples:data_${filename.uuid}.json
records_${filename.timestamp}_${filename.sequence}.avro
- Default Value
- ${filename.uuid}
- Accepted Values
- Required
- false
Output Filename Sequence Initial Value
- Description
- This property is used to configure the initial value of the
${filename.sequence}
expression. The value you set in this property is not the initial value of the sequence. The sequence starts at the value of this property +1. For example, if you set this property to 0, the sequence starts at 1. - Default Value
- 0
- Accepted Values
- Required
- false
Output Filename Sequence Padding Length
- Description
- Specifies the length of the
${filename.sequence}
expression in characters. If the sequence has fewer characters than the value set in this property, it is padded with zeros (0). Padding is added to the left of the sequence. - Default Value
- 6
- Accepted Values
- Required
- false
Output Filename Timestamp Format
- Description
- The timestamp format to use for the
${filename.timestamp}
expression. For example,yyyyMMdd_HHmmss_SSS
. - Default Value
- Accepted Values
- Required
- false
Output Files by Kafka Partitions
- Description
- Controls how fetched messages are merged together in the output file. If set to
true
, only messages fetched from the same Kafka partition get merged together in one output file (provided thatMaximum Number of Entries
is greater than 1). If set tofalse
, messages are merged together in the order they are fetched from the Kafka topic. This property has no effect ifMaximum Number of Entries
is 1. - Default Value
- false
- Accepted Values
- true, false
- Required
- false
Schema Access Strategy
- Description
- Specifies the strategy used for determining the schema of the Kafka record. The value
you set here depends on the data format set in
Kafka Message Data Format.
- If set to
Schema Registry
, the schema is read from Schema Registry. This setting can be used with both Avro and JSON formats. - If set to
Infer Schema
, the schema is inferred based on the input file. This setting can only be used ifKafka Message Data Format
isJSON
. - If set to
Embedded Schema
, the schema embedded in the input is used. This setting can only be used ifKafka Message Data Format
isAvro
. - If set to
HWX Content-Encoded Schema Reference
, the schema is read from Schema Registry. This setting can only be used ifKafka Message Data Format
isAvro
. In this case the Avro messages are expected to have a reference to the schema in Schema Registry encoded within the message content.
Kafka Message Data Format
is set toRaw
). - If set to
- Default Value
- Schema Registry
- Accepted Values
- Schema Registry, Infer Schema, Embedded Schema, HWX Content-Encoded Schema Reference
- Required
- true
Schema Branch
- Description
- The name of the branch to use when looking up the schema in Schema Registry.
Schema Branch
andSchema Version
cannot be specified at the same time. If one is specified, the other needs to be removed from the configuration. If Schema Registry is not used, this property must be completely removed from the configuration. - Default Value
- Accepted Values
- Required
- false
Schema Name
- Description
- The schema name to look up in Schema Registry. If the
Schema Access Strategy
property is set toSchema Registry
, this property must contain a valid schema name. If Schema Registry is not used, this property must be completely removed from the configuration JSON. - Default Value
- Accepted Values
- Required
- false
Schema Registry URL
- Description
- The URL of the Schema Registry server. If Schema Registry is not used, this property must be completely removed from the configuration JSON.
- Default Value
- http://localhost:7788/api/v1
- Accepted Values
- Required
- true
Schema Version
- Description
- The version of the schema to look up in Schema Registry. If Schema Registry is used
and a schema version is not specified, the latest version of the schema is retrieved.
Schema Branch
andSchema Version
cannot be specified at the same time. If one is specified, the other needs to be removed from the configuration. If Schema Registry is not used, this property must be completely removed from the configuration. - Default Value
- Accepted Values
- Required
- false
Time Format
- Description
- Specifies the format to use when reading or writing Time fields to JSON or CSV.
- Default Value
- HH:mm:ss
- Accepted Values
- Required
- true
Timestamp Format
- Description
- Specifies the format to use when reading or writing Timestamp fields to JSON or CSV.
- Default Value
- yyyy-MM-dd HH:mm:ss.SSS
- Accepted Values
- Required
- true
Truststore Filename for Schema Registry
- Description
- The fully-qualified filename of a truststore. This truststore is used to establish a secure connection with Schema Registry using HTTPS.
- Default Value
- The location of the default truststore which is empty and can only be used for unsecure connections.
- Accepted Values
- Required
- true
Truststore Password for Schema Registry
- Description
- The password used to access the contents of the truststore configured in the
Truststore Filename
property. - Default Value
- password
- Accepted Values
- Required
- true
Truststore Type for Schema Registry
- Description
- The type of the truststore configured in the
Truststore Filename
property. - Default Value
- JKS
- Accepted Values
- BCFKS, PKCS12, JKS
- Required
- true