Enrichment Overview

After you have parsed and normalized the raw security telemetry events, the next step is to enrich the data elements of the normalized event.

Enrichments add external data from data stores (such as HBase). Cloudera Cybersecurity Platform (CCP) uses a combination of HBase, Storm, and the telemetry messages in json format to enrich the data in real time to make it relevant and consumable. For example, the GEO enrichment provides latitude and longitude coordinates plus the city, state, and country for an external IP address. You can use this enriched information immediately rather than needing to hunt in different silos for the relevant information.

CCP provides a separate HBase table named enrichment_list that holds the list of enrichment types. It is dynamically populated by listening to inserts made on the "enrichment" table and helps to provide an accurate list of all available enrichment types.

CCP provides two types of enrichment:

Telemetry events
Threat intelligence information

The telemetry data sources for which CCP includes parsers (for example, Bro, Snort, and YAF) already include the following enrichment topologies that activate when you start the data sources in CCP:

Asset
GeoIP
User

You can also add enrichment sources for your own parsers to suit your needs. For example, you might want to use one or more of the following:

Transformation: Extracts data from existing fields. Brings in profile or machine learning results.
Geocoding: Adds information about city, country, postal code, latitude, and longitude for IP addresses.
Organization Classification: Provides information about host, user, and any other assets derived from organizational data sources such as CMDB or employee databases.
Whois: Provides ownership information of the web domain.

One of the advantages of the enrichment topology is that it groups messages by HBase key. Whenever you execute a Stellar function, you can add a caching layer, thus decreasing the need to call HBase for every event.

Prior to enabling an enrichment capability within CCP, the enrichment store (which for CCP is primarily HBase) must be loaded with enrichment data. The dataload utilities convert raw data sources to a primitive key (type, indicator) and value and place it in HBase. You can load enrichment data from the local file system or HDFS, or you can use the parser framework to stream data into the enrichment store. The enrichment loader transforms the enrichment into a JSON format that CCP can use. Additionally, the loading framework can detect and remove old data from the enrichment store automatically.

CCP supports three types of enrichment loaders:

Bulk load from HDFS via MapReduce
Taxii Loader
Flat File ingestion

After you load the stores, you can incorporate into the enrichment topology an enrichment bolt to a specific field or tag within a Metron message. The bolt can detect when it can enrich a field and then take an enrichment from the enrichment store and tag the message with it. The enrichment is then stored within the bolt's in-memory cache. CCP uses the underlying Storm routing capabilities to ensure that similar enrichment values are sent to the appropriate bolts that already have these values cached in-memory.

CCP supports two types of configurations: global and sensor specific. The sensor specific configuration configures the individual enrichments and threat intelligence enrichments for a given sensor type (for example, squid). This section describes sensor specific configurations.