To build a complex streaming analytics application from scratch, we will work with a
fictional use case. A trucking company has a large fleet of trucks, and wants to perform
real-time analytics on the sensor data from the trucks, and to monitor them in real time. Their
analytics application has the following requirements:
-
Outfit each truck with two sensors that emit event data
such as timestamp, driver ID, truck ID, route, geographic location, and event type.
-
Stream the sensor events to an IoT gateway. The data
producing app (e.g: a truck) will send CSV events from each sensor to one of three gateway
topics ( gateway-west-raw-sensors, gateway-east-raw-sensors or
gateway-central-raw-sensors). Each event will pass the schema name for the event as a
Kafka event header.
-
Use NiFi to consume the events from the Kafka topic, and
then route, transform, enrich, and deliver the data from the gateways to two syndication
topics (e.g: syndicate-geo-event-avro, syndicate-speed-event-avro,
syndicate-geo-event-json, syndicate-speed-event-json ) that various downstream analytics
applications can subscribe to.
-
Connect to the two streams of data to perform analytics on
the stream.
-
Join the two sensor streams using attributes in real-time.
For example, join the geo-location stream of a truck with the speed stream of a driver.
-
Filter the stream on only events that are infractions or
violations.
-
All infraction events need to be available for descriptive
analytics (dash-boarding, visualizations, or similar) by a business analyst. The analyst
needs the ability to perform analysis on the streaming data.
-
Detect complex patterns in real-time. For example, over a
three-minute period, detect if the average speed of a driver is more than 80 miles per
hour on routes known to be dangerous.
-
When each of the preceding rules fires, create alerts, and
make them instantly
accessible.
-
Execute a logistical regression Spark ML model on the
events in the stream to predict if a driver is going to commit a violation. If violation
is predicted, then generate an alert.
-
Monitor and manage the entire application using Streams Messaging Manager and Stream
Operations.
The following sections walk you through how to implement all ten requirements.
Requirements 1-3 are performed using NiFi and Schema Registry. Requirements 4 through 10 are
implemented using the new Streaming Analytics Manager.