Writing data to Ozone in an unsecured cluster with Kafka Connect

You can use the HDFS Sink Connector developed by Cloudera, i to write Kafka topic data to Ozone in an unsecure cluster. Connector deployment and configuration is done using the SMM UI.

The following steps walk you through how the Cloudera-developed HDFS Sink Connector can be set up and deployed using the SMM UI.

In addition to the connector setup, these steps also describe how you can create a test topic and populate it with data using Kafka command line tools. If you already have a topic that is ready for use and do not want to create a test topic, you can skip steps 1 through 3. These steps deal with topic creation, message consumption, and message production. These steps are optional.

An unsecure CDP PvC Base cluster with Kafka, SMM, and Ozone is set up and configured.
  1. Create a Kafka topic:
    1. SSH into one of the hosts in your cluster.
      ssh [***USER***]@[***MY-CLUSTER-HOST.COM***]
    2. Create a topic with the kafka-topics tool.
      kafka-topics --create --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***] --replication-factor 1 --partitions 1 --topic [***TOPIC***]
      If an out of memory exception is displayed while running this command, increase the JVM heap with the following command and try again:
      export KAFKA_OPTS="-Xmx1g -Xms1g"
    3. Verify that the topic is created.
      kafka-topics --list --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***]
  2. Produce messages to your topic with the kafka-console-producer.
    kafka-console-producer --broker-list [***MY-CLUSTER-HOST.COM:9092***] --topic [***TOPIC***]
    Enter test messages once the tool is running.
    >my first message
    >my second message
  3. Consume messages:
    1. Open a new terminal session and log in to one of the hosts in your cluster.
    2. Consume messages with the kafka-console-consumer.
      kafka-console-consumer --from-beginning --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***] --topic [***TOPIC***] 
      The messages you produced with kafka-console-producer appear. In addition, you can switch to the terminal session that is running kafka-console-producer and produce additional messages. These new messages appear in real time in the session running kafka-console-consumer.
  4. Deploy and configure an HDFS Sink Connector:
    1. In Cloudera Manager, select the Streams Messaging Manager service.
    2. Click Streams Messaging Manager Web UI.
    3. Click the Connect option in the left-side menu.
    4. Click + New Connector to add a new connector.
    5. Go to the Sink Connectors tab and select the HDFS Sink Connector.
      On the UI the HDFS Sink Connector is represented by its class name, which is com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector.
    6. Enter a name for the connector.
    7. Configure the connector.
      Use the following example as a template:
      {
       "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
       "hdfs.uri": "o3fs://bucket1.vol1.ozone1/",
       "hdfs.output": "/topics_output/",
       "tasks.max": "1",
       "topics": "testTopic",
       "hadoop.conf.path": "file:///etc/hadoop/conf",
       "output.writer": "com.cloudera.dim.kafka.connect.partition.writers.txt.TxtPartitionWriter",
       "value.converter": "org.apache.kafka.connect.storage.StringConverter",
       "output.storage": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
       "hdfs.kerberos.authentication": "false"
      }
      
      Ensure that you replace the values of hdfs.uri and hdfs.output with valid Ozone paths. The template gives an example of how these paths should look like. Replace any other values depending on your cluster configuration and requirements.
    8. Click Validate.
      The validator displays any JSON errors in your configuration. Fix any errors that are displayed. If your JSON is valid, the JSON is valid message is displayed in the validator.
    9. Click Next.
    10. Review your connector configuration.
    11. Click Deploy .
  5. Verify that connector deployment is successful:
    1. In the SMM UI, click the Connect option in the left-side menu.
    2. Click on either the topic or the connector you created.
      If connector deployment is successful, a flow is displayed between the topic you specified and the connector you created.
  6. Verify that topic data is written to Ozone.
    You can do this by listing the files under the o3fs:// location you specified in the connector configuration.