Creating Kafka tables

You can use the registered Kafka provider to create Kafka tables that can be used as source and sink in your SQL Stream jobs.

  • Make sure that you have created a Kafka topic.
  • Make sure there is generated data in the Kafka topic.
  • Make sure that you have the right permissions set in Ranger.
  1. Go to your cluster in Cloudera Manager.
  2. Click SQL Stream Builder from the list of services.
  3. Click SQLStreamBuilder Console.
    The Streaming SQL Console opens in a new window.
  4. Select Console from the left- side menu.
  5. Go to the Tables tab.
  6. Select Add table > Apache Kafka.
    The Kafka Table window appears.
  7. Provide a Name for the Table.
  8. Select a registered Kafka provider as Kafka cluster.
  9. Select a Kafka topic from the list.
  10. Select the Data format.
    • You can select JSON as data format.
    • You can select AVRO as data format.
  11. Determine the Schema for the Kafka table.
    1. Add a customized schema to the Schema Definition field.
    2. Click Detect Schema to read a sample of the JSON messages and automatically infer the schema.
  12. Customize your Kafka Table with the following options:
    1. Configure the Event Time if you do not want to use the Kafka Timestamps.
      1. Unselect the checkbox of Use Kafka Timestamps.
      2. Provide the name of the Input Timestamp Column.
      3. Add a name for the Event Time Column.
      4. Add a value to the Watermark Seconds.
    2. Configure an Input Transform, add the code using the Transformations tab.
    3. Configure any Kafka properties required using the Properties tab.
    For more information about how to configure the Kafka table, see the Configuring Kafka tables section.
  13. Select Save Changes.
The Kafka Table is ready to be selected as a source or a sink for the SQL Stream job. To use Kafka as a source add it to the SQL query with the FROM statement. To use it as a sink, select the Kafka table when creating the SQL job from the Sink Table drop-down menu.