Chapter 8. Using Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant processing of real-time data streams. Data can be ingested from sources such as Kafka and Flume, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join, and window. Processed data can be sent to file systems, databases, and live dashboards.

	Important
	Kafka Direct Receiver integration with Spark Streaming only works when the cluster is not Kerberos-enabled. Dynamic Resource Allocation does not work with Spark Streaming.

The Apache Spark Streaming Programming Guide offers conceptual information; programming examples in Scala, Java, and Python; and performance tuning information.

For additional examples, see the Apache GitHub example repositories for Scala, Java, and Python.

​Chapter 8. Using Spark Streaming

Chapter 8. Using Spark Streaming