Spark Guide
Also available as:
PDF
loading table of contents...

Chapter 8. Using Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant processing of real-time data streams. Data can be ingested from sources such as Kafka and Flume, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join, and window. Processed data can be sent to file systems, databases, and live dashboards.

[Important]Important
  • Kafka Direct Receiver integration with Spark Streaming only works when the cluster is not Kerberos-enabled.

  • Dynamic Resource Allocation does not work with Spark Streaming.

The Apache Spark Streaming Programming Guide offers conceptual information; programming examples in Scala, Java, and Python; and performance tuning information.

For additional examples, see the Apache GitHub example repositories for Scala, Java, and Python.