kafka connectPDF version

Amazon S3 Sink

Learn more about the Amazon S3 Sink connector

The Amazon S3 Sink connector allows users to stream Kafka data into S3 buckets.

A simple configuration example for the Amazon S3 Sink connector.

The following is a simple configuration example for the Amazon S3 Sink connector. Short descriptions of the properties set in this example are also provided. For a full properties reference, see the Amazon S3 Sink properties reference.

{
    "aws.s3.bucket": "bring-me-the-bucket",
    "aws.s3.service_endpoint": "http://myendpoint:9090/",
    "aws.access_key_id": "EXAMPLEID",
    "aws.secret_access_key": “EXAMPLEKEY",
    "connector.class": "com.cloudera.dim.kafka.connect.s3.S3SinkConnector",
    "tasks.max": 1,
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "com.cloudera.dim.kafka.connect.converts.AvroConverter",
    "value.converter.passthrough.enabled": true,
    "value.converter.schema.registry.url": "http://schema-registry:9090/api/v1",
    "topics": "avro_topic",
    "output.storage": "com.cloudera.dim.kafka.connect.s3.S3PartitionStorage",
    "output.writer": "com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter",
    "output.avro.passthrough.enabled": true
  }
aws.s3.bucket
Target S3 bucket name.
aws.s3.service_endpoint
Target S3 host and port.
aws.access_key_id

The AWS secret key ID used for authentication.

aws.secret_access_key

The AWS secret access key used for authentication.

connector.class
Class name of the Amazon S3 Sink connector.
tasks.max
Maximum number of tasks.
key.converter
The converter capable of understanding the data format of the key of each record on this topic.
value.converter
The converter capable of understanding the data format of the value of each record on this topic.
value.converter.passthrough.enabled
This property controls whether or not data is converted into the Kafka Connect intermediate data format before writing into an output file. Because in this example the input and output format is the same, the property is set to true, that is, data is not converted.
value.converter.schema.registry.url
The URL to Schema Registry. This is a mandatory property if the topic has records encoded in Avro format.
topics
List of topics to consume data from.
output.storage
The S3 storage implementation class.
output.writer
Determines the output file format. Because in this example the output format is Avro, AvroPartitionWriter is used.
output.avro.passthrough.enabled
This property has to match the configuration of the value.converter.passthrough.enabled property because both the input and output formats are Avro.

Amazon S3 Sink connector properties reference.

The following table collects connector properties that are specific for the Amazon S3 Sink Connector. For properties common to all sink connectors, see the upstream Apache Kafka documentation.

Property Name Description Type Default Value Accepted Values Recommended Value
aws.s3.bucket The target S3 bucket name. String none Any valid S3 bucket name.
aws.s3.service_endpoint The target S3 host and port. String none Any valid S3 endpoint.
aws.access_key_id The AWS secret key ID to authenticate. String none Any valid secret key issued by AWS.
aws.secret_access_key The AWS secret access key to authenticate. String none Any valid access key issued by AWS.
value.converter Value conversion class. String none com.cloudera.dim.kafka.connect.converts.AvroConverter
value.converter.passthrough.enabled Configures whether the AvroConverter translates an Avro record into Kafka Connect Data or transparently passes the Avro encoded bytes as payload. Boolean true true, false True if input and output are both Avro.
value.converter.schema.registry.url The URL to the Schema Registry server. String none
output.storage The S3 storage implementation class. String none com.cloudera.dim.kafka.connect.s3.S3PartitionStorage
output.writer The output file writer which determines the type of file to be written. The value of this property should be the FQCN of a class that implements the PartitionWriter interface. String none
  • com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter
  • com.cloudera.dim.kafka.connect.partition.writers.json.JsonPartitionWriter
  • com.cloudera.dim.kafka.connect.hdfs.parquet.ParquetPartitionWriter
  • com.cloudera.dim.kafka.connect.partition.writers.txt.TxtPartitionWriter
com.cloudera.dim.kafka.connect.partition.writers.avro.AvroPartitionWriter
output.avro.passthrough.enabled

Configures Whether the output writer expects an Avro encoded Kafka Connect data record. Must match the configuration ofvalue.converter.passthrough.enabled.

Boolean none true, false True if input and output are both Avro.

We want your opinion

How can we improve this page?

What kind of feedback do you have?