Introducing streams messaging cluster on CDP Public Cloud

You can create a streams messaging cluster in CDP Public Cloud and produce data to and consume data from an Apache Kafka topic by using that cluster. The concepts and usage of streams messaging cluster on CDP Public Cloud are explained using a simple workflow.

In this example you use a simple application (vmstat) to generate some raw text data and a kafka-console-producer command to send the data to a Kafka topic. You then use a kafka-console-consumer command to read data from that Kafka topic. After you have the producer and consumer running, you use Stream Messaging Manager (SMM), which is also available on CDP Public Cloud, to monitor the functionality of the workflow that you create.

Alternately, you can also structure the data in an object format and send it to Kafka in Avro format. To do this, you store a schema in CDP’s Schema Registry and understand how to use a simple Java Kafka client to send and read data using that schema.

You can also update your schema to fit your requirements, and reconfigure the consumer and producer to use the latest schema.

The complete workflow consists of the following tasks:
  1. Meeting the prerequisites
  2. Creating a Machine User
  3. Granting Machine User access to environment
  4. Creating a Kafka topic
  5. Creating Ranger policies
  6. Producing data to Kafka topic
  7. Consuming data from Kafka topic
  8. Using Kerberos authentication
  9. Monitoring Kafka activity in SMM
  10. Using Schema Registry
  11. Monitoring end-to-end latency in SMM
  12. Evolving your schema