Chapter 8. Using Spark Streaming
Spark Streaming is an extension of the core Spark API that enables scalable,
high-throughput, fault-tolerant processing of real-time data streams. Data can be ingested from
sources such as Kafka and Flume, and can be processed using complex algorithms expressed with
high-level functions like map
, reduce
, join
, and
window
. Processed data can be sent to file systems, databases, and live
dashboards.
Important | |
---|---|
Kafka Direct Receiver integration with Spark Streaming only works when the cluster is not Kerberos-enabled. Dynamic Resource Allocation does not work with Spark Streaming. |
The Apache Spark Streaming Programming Guide offers conceptual information; programming examples in Scala, Java, and Python; and performance tuning information.
For additional examples, see the Apache GitHub example repositories for Scala, Java, and Python.