Apache Spark is a general framework for distributed computing that offers very high
performance for both iterative and interactive processing. Spark exposes APIs for Java, Python, and Scala. Spark consists of Spark core and several related projects:
- Spark SQL - module for working with structured data. Allows you to seamlessly mix SQL queries with Spark
programs.
- Spark Streaming - API that allows you to build scalable fault-tolerant streaming applications.
- MLlib - library that implements common machine learning algorithms.
- GraphX - API for graphs and graph-parallel computation.