Using Stellar Properties to Transform Enrichment Data
You can transform and filter enrichment data as it is loaded into HBase by using Stellar extractor properties in the extractor configuration file. This feature is available to all extractor types. This task is optional.
-
Transform and filter the enrichment data as it is loaded into HBase by using the
following Stellar extractor properties in the
extractor_config.jsonfile at $METRON_HOME/config:Extractor Property Description Example value_transformTransforms fields defined in the
columnsmapping with Stellar transformations. New keys introduced in the transform are added to the key metadata."value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)"value_filterAllows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose domain property is empty after removing the TLD are omitted.
"value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain",
indicator_transformTransforms the
indicatorcolumn independent of the value transformations. You can refer to the original indicator value by usingindicatoras the variable name, as shown in the following example. In addition, if you prefer to piggyback your transformations, you can refer to the variabledomain, which allows your indicator transforms to inherit transformations done to this value during the value transformations."indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)"indicator_filterAllows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose indicator value is empty after removing the TLD are omitted.
"indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains",
If you include all of the supported Stellar extractor properties in the extractor configuration file, it will look similar to the following:{ "config" : { "zk_quorum" : "node1:2181", "columns" : { "rank" : 0, "domain" : 1 }, "value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)" }, "value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain", "indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)" }, "indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains", "separator" : "," }, "extractor" : "CSV" }For example, if you have atop-list.csvcontaining the following information:1,google.com 2,youtube.com ...when you run a file import with the above data and extractor configuration you should receive the following results in two extracted data records:Indicator Type Value google top_domains { "rank" : "1", "domain" : "google" } yahoo top_domains { "rank" : "2", "domain" : "yahoo" } - Remove any non-ASCII invisible characters that might have been included if you copy and
pasted:
iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json -
To access properties that reside in the global configuration file, provide a
ZooKeeper quorum via the
zk_quorum property.For example, if the global configuration looks like the following:{ "global_property" : "metron-ftw" }enter the following to expand thevalue_transform:"value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)", "a-new-prop" : "global_property" },The resulting value data will look like the following:Indicator Type Value google top_domains { "rank" : "1", "domain" : "google", "a-new-prop" : "metron-ftw" } yahoo top_domains { "rank" : "2", "domain" : "yahoo", "a-new-prop" : "metron-ftw" }
