Chapter 1. Using Apache Storm

The exponential increase in data from real-time sources such as machine sensors creates a need for data processing systems that can ingest this data, process it, and respond in real time. A typical use case involves an automated system that responds to sensor data by sending email to support staff or placing an advertisement on a consumer's smart phone. Apache Storm enables such data-driven and automated activity by providing a realtime, scalable, and distributed solution for streaming data.

Apache Storm can be used with any programming language, and guarantees that data streams are processed without data loss.

Storm is datatype-agnostic; it processes data streams of any data type.

A complete introduction to the Storm API is beyond the scope of this documentation. However, the next section, Basic Storm Concepts, provides a brief overview of the most essential concepts and a link to the javadoc API. See the Apache Storm documentation for a more thorough discussion of Apache Storm concepts.

Experienced Storm developers may want to skip to later sections for information about streaming data to Hive; ingesting data with the Apache Kafka spout; writing data to HDFS, HBase, and Kafka; and deploying Storm topologies.

The last section, RollingTopWords Topology, lists the source code for a sample application included with the storm-starter.jar.

Legal notices