Configuration example for writing data to HDFS

A simple configuration example for the HDFS Sink Connector.

The following is a simple configuration example for the HDFS Sink Connector. Short descriptions of the properties set in this example are also provided. For a full properties reference, see the HDFS Sink Connector Properties Reference.

{
    "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
    "tasks.max": 1,
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
    "value.converter.passthrough.enabled": true,
    "value.converter.schema.registry.url": "http://localhost:9090/api/v1",
    "topics": "avro_topic",
    "hdfs.uri": "hdfs://my-host.my-realm.com:8020",
    "hdfs.output": "/topics_output/",
    "output.writer": "com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter",
    "output.avro.passthrough.enabled": true,
    "hdfs.kerberos.authentication": true,
    "hdfs.kerberos.user.principal": "user_account@MY-REALM.COM",
    "hdfs.kerberos.keytab.path": "/path/to/user_account.keytab",
    "hdfs.kerberos.namenode.principal": "hdfs/_HOST@MY-REALM.COM",
    "hadoop.conf.path": "/etc/hadoop/"
  }

connector.class: Class name of the HDFS Sink Connector.
key.converter: The converter capable of understanding the data format of the key of each record on this topic.
value.converter: The converter capable of understanding the data format of the value of each record on this topic.
note
When the AvroConverter is used, you can specify Schema Registry properties to be used by the AvroConverter’s Schema Registry client. This is done by adding the required Schema Registry property as a suffix to the value.converter property. For example, value.converter.schema.registry.url. Properties defined this way are passed on to the Schema Registry client used by the AvroConverter.
value.converter.passthrough.enabled: This property controls whether or not data is converted into the Kafka Connect intermediate data format before writing into an output file. Because in this example the input and output format is the same, the property is set to true, that is, data is not converted.
value.converter.schema.registry.url: The URL to Schema Registry. This is a mandatory property if the topic has records encoded in Avro format.
topics: List of topics to consume data from.
hdfs.uri: The URI to the namenode of the HDFS cluster.
hdfs.output: The destination folder on the HDFS cluster where output files will reside.
output.writer: Determines the output file format. Because in this example the output format is Avro, AvroPartitionWriter is used.
output.avro.passthrough.enabled: This property has to match the configuration of the value.converter.passthrough.enabled property because both the input and output formats are Avro.
hdfs.kerberos.authentication: Enables or disables kerberos authentication.
hdfs.kerberos.user.principal: The user principal that the Kafka Connect role will use.
hdfs.kerberos.keytab.path: The path to the kerberos keytab file.
hdfs.kerberos.namenode.principal: The Kerberos principal used by the namenode. This is necessary when the HDFS cluster has data encryption turned on.
hadoop.conf.path: The path to the hadoop configuration files. This is necessary when the HDFS cluster has data encryption turned on.