HDFS Sink Connector

Learn more about the HDFS Sink Connector.

The HDFS Sink Connector can be used to transfer data from Kafka topics to files on HDFS clusters. Each partition of every topic results in a collection of files named in the following pattern:
{topic name}_{partition number}_{end_offset}.{file extension}
For example, running the HDFS Sink Connector on partition 0 of a topic named sourceTopic can yield the following series of files:
sourceTopic_0_50.avro - for record 0 ~ 50
sourceTopic_0_79.avro - holding record 51 ~ 79
...
The HDFS Sink Connector periodically commits records to final result files. Each commit results in a separate "chunk" file.