Chapter 7. Advanced: Doing Predictive Analytics on the Stream
Requirement 10 of this use case states the following:
Execute a logistical regression Spark ML model on the events in the stream to predict if a driver is going to commit a violation. If violation is predicted, then alert on it.
HDP, the Hortonworks data at rest platform provides powerful set of tools for data engineers and scientists to build powerful analytics with data processing engines like Spark Streaming, Hive and Pig. The below diagram illustrates a typical analytics life cycle in HDP.
Once the model has been trained and optimized, insights can be created by scoring the model in real-time as events are coming in. The next set of steps in the life cycle has to do with scoring the model in real-time using HDF components.
In the next few sections we will walk through how to do steps 5 through 9 in SAM.