Understanding the Use Case
To build a complex streaming analytics application from scratch, we will work with a fictional use case. A trucking company has a large fleet of trucks, and wants to perform real-time analytics on the sensor data from the trucks, and to monitor them in real time. Their analytics application has the following requirements:
Outfit each truck with two sensors that emit event data such as timestamp, driver ID, truck ID, route, geographic location, and event type.
The geo event sensor emits geographic information (latitude and longitude coordinates) and events such as excessive braking or speeding.
The speed sensor emits the speed of the vehicle.
Stream the sensor events to an IoT gateway. The data producing app will send CSV events from each sensor to a single Kafka topic and will pass the schema name for the event as a Kafka event header. See Data Producer Application Generates Events.
Use NiFi to consume the events from the Kafka topic, and then route, transform, enrich, and deliver the data from the two streams to two seperate Kafka topics.
Connect to the two streams of data to perform analytics on the stream.
Join the two sensor streams using attributes in real-time. For example, join the geo-location stream of a truck with the speed stream of a driver.
Filter the stream on only events that are infractions or violations.
All infraction events need to be available for descriptive analytics (dash-boarding, visualizations, or similar) by a business analyst. The analyst needs the ability to perform analysis on the streaming data.
See Streaming Violation Events to an Analytics Engine for Descriptive Analytics.
Detect complex patterns in real-time. For example, over a three-minute period, detect if the average speed of a driver is more than 80 miles per hour on routes known to be dangerous.
When each of the preceding rules fires, create alerts, and make them instantly accessible.
See Creating Alerts with Notifications Sink and Streaming Alerts to an Analytics Engine for Dashboarding.
Execute a logistical regression Spark ML model on the events in the stream to predict if a driver is going to commit a violation. If violation is predicted, then generate an alert.
The following sections walk you through how to implement all ten requirements. Requirements 1-3 are performed using NiFi and Schema Registry. Requirements 4 through 10 are implemented using the new Streaming Analytics Manager.