Getting started with Kafka

Get started with Kafka in CSP Community Edition.

The following sections walk you through the basics of how Kafka is used in CSP Community Edition. Completing this tutorial you will learn:
  • How to create a topic using Streams Messaging Manager (SMM).
  • How to produce and consume data using Kafka's built-in console clients as well as your own client applications.
  • How to monitor topic activity in SMM.

Before you begin

Find out the name or ID of the Kafka container. You will need to pass the container name or ID as a parameter in some of the commands you will be running.The container name and ID can be listed using docker ps. For example:
docker ps -a --format '{{.ID}}\t{{.Names}}' --filter "name=kafka.(\d)" 

The Kafka container will either be called cspce-kafka-1 or cspce_kafka_1.

Creating a topic

Learn how to create a Kafka topic using the SMM UI.

  1. Access the SMM UI by entering the following in a browser window:
    http://localhost:9991
  2. Click (Topics) in the navigation sidebar.
  3. Click Add New.
  4. Configure the topic as follows:
    • Topic Name: csp-ce
    • Partitions: 3
    • Availability: Low
    • Cleanup Policy: delete
  5. Click Save.
  6. Verify that the topic was created.
    This can be done by typing csp-ce in the search field. If the topic was successfully created, it will be listed under Topics.

Producing and consuming data

Learn how to produce and consume data to and from Kafka topics using Kafka’s built-in console tools or your own client application.

Once you have created your topic it's time to start producing (writing) and consuming (reading) some data (also referred to as records or messages). Data production and consumption happens using Kafka producer and consumer applications, or clients for short. Clients connect to a Kafka server, called a broker, and either produce data to or consume data from topics. In most production environments these applications are custom built using a Kafka client library. However, Kafka is shipped with command line tools, including a console producer and consumer application that you can use to test Kafka’s features and capabilities.

The following list of steps will walk you through how you can use Kafka’s built-in console producer to produce some messages and then consume and view those messages using the SMM UI. Additionally, information about how to configure your own custom developed clients to connect to Kafka in CSP Community Edition is also provided.

  • Ensure that you have a topic available. If not, create one, see Creating a topic.
  • Ensure that you have the SMM UI open. If not, enter the following in a browser window:
    http://localhost:9991
Steps
  1. Open a terminal sessions and run the following command:
    docker exec -it [***KAFKA CONTAINER NAME OR ID***] /bin/bash

    This command launches a Bash session within the docker container that Kafka is running in. The interactive session is required to run the console producer.

  2. Run the console producer to start producing data.
    /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9094 --topic csp-ce
    The --bootstrap-server option specifies the host and port of the Kafka broker that the client should connect to. The --topic option specifies the topic that data will be produced to.
  3. Start typing to produce data.
    >my first message
    >my second message

    Once you are done entering messages, ensure that you leave the console producer running.

  4. In the SMM UI locate your topic and click (Profile) next to the topic name.
  5. Go to Data Explorer.

    The Data Explorer tab lets you sample the data that is flowing through the topic. The messages that you produced using the console producer will be available on this page. Select different partitions and experiment with the slider controls to view data.

  6. Switch over to the terminal session running the producer and produce some more data.
  7. Switch back to the Data Explorer in SMM and refresh the page.

    The new messages you produced should be visible in Data Explorer.

  8. Close the producer session with CTRL+C.
To connect your own Kafka clients, that are running outside of the docker network, to the Kafka broker deployed with CSP Community Edition you need to run the client with the following configuration:
topic=[***TOPIC***]
bootstrap.servers=localhost:9094
Replace [***TOPIC***] with a name of a topic available in the Kafka broker. If you have been following this tutorial, the topic name is csp-ce. Once you have a producer set up and running, you can immediately view the data that is being produced into a topic using the SMM UI.
  1. In the SMM UI locate your topic and click (Profile) next to the topic name.
  2. Go to Data Explorer.

    The Data Explorer tab lets you sample the data that is flowing through the topic. The messages that you produce will be available on this page. Select different partitions and experiment with the slider controls to view data.

Monitoring Kafka topics with SMM

Learn how you can monitor Kafka topics and topic activity in SMM.

Now that you know how to create a topic and produce/consume data, it is time to learn how you can monitor your topics and the activity of those topics using Streams Messaging Manager (SMM).

Although you already have a topic with some data, the following steps will walk you through another production/consumption process where data production and consumption is continuous. This is done so that the topic activity is closer to what you would actually see in a production environment. Once you have the data flowing, you will access the SMM UI and monitor topic and client activity as well as look at the data that is being produced.

  • Ensure that you have a topic available. If not, create one, see Creating a topic.
  • Ensure that you have the SMM UI open. If not, enter the following in a browser window:
    http://localhost:9991
  • Ensure that you have two interactive Bash sessions open in the Kafka container. If not, open two separate terminal sessions and run the following command in both sessions:
    docker exec -it [***KAFKA CONTAINER NAME OR ID***] /bin/bash
  1. Produce and consume some data.
    1. Pick a terminal session and run the console producer.
      vmstat 1 1000 | /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server localhost:9094 --topic csp-ce
      This command uses vmstat to generate data. The data that vmstat prints is then picked up by the console producer and is streamed into the specified Kafka topic. Each line from the output of vmstat is produced as a standalone message. Do not close this session or interrupt the producer, otherwise data production will stop.
    2. Switch over to the other terminal session and run the console consumer.
      /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9094 --topic csp-ce --from-beginning
      The messages that are being produced by the console producer instance should appear on the screen. As long as you keep both processes running, and as long as vmstat is generating data, the producer will continue to produce messages, and the consumer will continue to consume them.
  2. Switch over to the SMM UI and start monitoring topic activity.
    When you open SMM, you are presented with the Overview page. This page gives you information about the total number of producers, brokers, topics, and consumer groups. It also provides more detailed metrics about producers and consumers. The following points give a quick introduction of the UI and its features. Feel free to experiment and explore.
    • In the middle you have a list of topics. Here, you can identify and review some key metrics for the overall topic activity. Clicking on a topic lists the partitions of that topic as well as partition-level metrics.
    • On the left and right hand side of the page you can see your producers and consumers. If everything is working correctly, you should see at least one active producer and at least one active consumer. Clicking on either the producer or consumer will show their activity, highlighting all the topics and partitions they are writing to or reading from, respectively.
    • Clicking on (Profile) redirects you to the topic details page which has a total of four tabs. The tabs and the information they present are as follows:
    The Metrics tab collects all the topic-related metrics and utilization charts.
    The Data Explorer tab lets you sample the data that is flowing through the topic. Select different partitions and experiment with the slider controls to view data. If the topic you are viewing is getting data produced into it actively, you can refresh the page to see the latest messages being produced

    The Configs tab gives you an overview of the topic’s configuration properties. Additionally, you can also reconfigure a topic on this tab.

    The Latency tab gives you a powerful snapshot of end-to-end latency including the number of consumer groups for a topic, number of clients inside a particular consumer group, and number of partitions in a topic.This tab also includes two graphs that visualize messages consumed as well as end-to-end latency. By default the Latency tab will not show any data. This is because in order for SMM to fetch metrics related to end-to-end latency, clients must be configured. Specifically, interceptors must be enabled within the client. This is not covered in this tutorial. For more information on enabling interceptors and monitoring end-to-end latency, see Monitoring End-to-End Latency using Streams Messaging Manager.

Completing this tutorial you learned how to create Kafka topics using SMM, how to produce and consume data and got familiar with the basic monitoring features of SMM. Cloudera recommends that you hop over to the Schema Registry or Kafka Connect tutorials and complete those as well. Alternatively, If you want to dive deeper into Kafka or SMM, visit any of the resources listed in the Related Information section below.