Creating Models Overview

One of the enhancements to cybersecurity most frequently requested is the ability to augment the threat intelligence and enrichment processes with insights derived from machine learning and statistical models.

You can add a model to an enrichment stream to identify malware data. For example, you might want to use a Domain Generation Algorithm (DGA) detection model to detect generated as opposed to authentic domains. The following figure illustrates the data flow and output if you are using a DGA detection model:

While valuable, the model management infrastructure has significant challenges.

  • Applying the model management infrastructure might be both computationally and resource intensive and could require load balancing and multiple versions of models.

  • Models require frequent training or updating to react to growing threats and new patterns that emerge.

  • Models should be language and environment agnostic as much as possible. So, models should include small-data and big-data libraries and languages.

To support these requirements, Cloudera Cybersecurity Platform (CCP) powered by Metron provides the following components:

  • A YARN application that listens for model deployment requests and upon execution, registers their endpoints in ZooKeeper.

  • A command line deployment client that localizes the model payload onto HDFS and submits a model request.

  • A Java client that interacts with ZooKeeper and receives updates about model state changes (for example, new deployments and removals).

  • A series of Stellar functions for interacting with models deployed by the Model as a Service infrastructure.