Understanding the Use Case
To build a complex streaming analytics application from scratch, we will work with a fictional use case. A trucking company has a large fleet of trucks, and wants to perform real-time analytics on the sensor data from the trucks, and to monitor them in real time. Their analyitcs application has the following requirements:
Outfit each truck with two sensors that emit event data such as timestamp, driver ID, truck ID, route, geographic location, and event type.
The geo event sensor emits geographic information (latitude and longitude coordinates) and events such as excessive braking or speeding.
The speed sensor emits the speed of the vehicle.
Stream the sensor events to an IoT gateway, which serializes the events as Avro objects and streams them into separate Kafka topics, one for each Kafka sensor.
Use NiFi to consume the serialized Avro events from the Kafka topics, and then route, transform, enrich, and deliver the data to a downstream Kafka instance.
Connect to the two streams of data to do analytics on the stream.
Join the two sensor streams using attributes in real-time. For example, join the geo-location stream of a truck with the speed stream of a driver.
Filter the stream on only events that are infractions or violations.
All infraction events need to be available for descriptive analytics (dash-boarding, visualizations, or similar) by a business analyst. The analyst needs the ability to do analysis on the streaming data.
Detect complex patterns in real-time. For example, over a three-minute period, detect if the average speed of a driver is more than 80 miles per hour on routes known to be dangerous.
When each of the preceding rules fires, create alerts and make them instantly accessible.
Execute a logistical regression Spark ML model on the events in the stream to predict if a driver is going to commit a violation. If violation is predicted, then alert on it.
The below sections walks you through how to implement all ten requirements. Requirements 1-3 are done using NiFi and Schema Registry. Requirements 4 through 10, are implemented using the new Streaming Analtyics Manager.