Writing data to Ozone in an unsecured cluster with Kafka Connect

You can use the Cloudera developed HDFS Sink Connector in an unsecure cluster to write Kafka topic data to Ozone. Connector deployment and configuration is done using the SMM UI.

The following list of steps walk you through how the Cloudera developed HDFS Sink Connector can be set up to write data from a Kafka topic to the Ozone file system in an unsecure cluster. The connector is set up and deployed using the SMM UI.

In addition to connector setup, these steps also describe how you can create a test topic and populate it with data using Kafka command line tools. If you already have a topic that is ready for use and do not want to create a test topic, you can skip steps 1 through 3. These steps deal with topic creation, message consumption and message production. They are not necessary to carry out.

An unsecure CDP PvC Base cluster with Kafka, SMM and Ozone is set up and configured.
  1. Create a Kafka topic:
    1. SSH into one of the hosts in your cluster.
      ssh [***USER***]@[***MY-CLUSTER-HOST.COM***]
    2. Create a topic with the kafka-topics tool.
      kafka-topics --create --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***] --replication-factor 1 --partitions 1 --topic [***TOPIC***]
      If an out of memory exception is thrown while running this command, increase the JVM heap with the following command and try again:
      export KAFKA_OPTS="-Xmx1g -Xms1g"
    3. Verify that the topic was created.
      kafka-topics --list --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***]
  2. Produce messages to your topic with the kafka-console-producer.
    kafka-console-producer --broker-list [***MY-CLUSTER-HOST.COM:9092***] --topic [***TOPIC***]
    Start typing messages once the tool is running.
    >my first message
    >my second message
  3. Consume messages:
    1. Open a new terminal session and log in to one of the hosts in your cluster.
    2. Consume messages with the kafka-console-consumer.
      kafka-console-consumer --from-beginning --bootstrap-server [***MY-CLUSTER-HOST.COM:9092***] --topic [***TOPIC***] 
      The messages you produced with the console producer appear. In addition, you can switch back to the terminal session that is running the kafka-console-producer and produce additional messages. These new messages will appear in real time in the session running the kafka-console-consumer.
  4. Deploy and configure a HDFS Sink Connector:
    1. In Cloudera Manager, select the Streams Messaging Manager service.
    2. Click Streams Messaging Manager Web UI.
    3. Click the Connect option in the left-side menu.
    4. Click + New Connector to add a new connector.
    5. Go to the Sink Connectors tab and select the HDFS Sink Connector.
      On the UI the HDFS Sink Connector is represented by its class name, which is com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector.
    6. Enter a name for the connector.
    7. Configure the connector.
      Use the following example as a template:
      {
       "connector.class": "com.cloudera.dim.kafka.connect.hdfs.HdfsSinkConnector",
       "hdfs.uri": "o3fs://bucket1.vol1.ozone1/",
       "hdfs.output": "/topics_output/",
       "tasks.max": "1",
       "topics": "testTopic",
       "hadoop.conf.path": "file:///etc/hadoop/conf",
       "output.writer": "com.cloudera.dim.kafka.connect.partition.writers.txt.TxtPartitionWriter",
       "value.converter": "org.apache.kafka.connect.storage.StringConverter",
       "output.storage": "com.cloudera.dim.kafka.connect.hdfs.HdfsPartitionStorage",
       "hdfs.kerberos.authentication": "false"
      }
      
      Ensure that you replace the values of hdfs.uri and hdfs.output with valid Ozone paths. The template gives an example of how these paths should look like. Replace any other values depending on your cluster and requirements.
    8. Click Validate.
      The validator displays any JSON errors in your configuration. Fix any errors that are displayed. If your JSON is valid, the JSON is valid message is displayed in the validator.
    9. Click Next.
    10. Review your connector configuration.
    11. Click Deploy to deploy the connector.
  5. Verify that connector deployment is successful:
    1. In the SMM UI, click the Connect option in the left-side menu.
    2. Click on either the topic or the connector you created.
      If connector deployment is successful, a flow is displayed between the topic you specified and the connector you created.
  6. Verify that topic data is written to Ozone.
    You can do this by listing the files under the o3fs:// location you specified in the connector configuration.