Using Apache Kafka with Apache Spark Streaming
For information on how to configure Apache Spark Streaming to receive data from Apache Kafka, see the appropriate version of the Spark Streaming + Kafka Integration Guide: 1.6.0 or 2.3.0.
In CDH 5.7 and higher, the Spark connector to Kafka only works with Kafka 2.0 and higher.
Validating Kafka Integration with Spark Streaming
To validate your Kafka integration with Spark Streaming, run the KafkaWordCount example.
If you installed Spark using parcels, use the following command:
/opt/cloudera/parcels/CDH/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics> <numThreads>
If you installed Spark using packages, use the following command:
/usr/lib/spark/bin/run-example streaming.KafkaWordCount <zkQuorum> <group> <topics><numThreads>
Replace the variables as follows:
- <zkQuorum> - ZooKeeper quorum URI used by Kafka (for example, zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181).
- <group> - Consumer group used by the application.
- <topic> - Kafka topic containing the data for the application.
- <numThreads> - Number of consumer threads reading the data. If this is higher than the number of partitions in the Kafka topic, some threads will be idle.