Using the CLI
This section shows you how to use the Grok parser to parse a new data source using the CLI.
Determine the format of the new data source’s log entries, so that you can parse them:
Use ssh to access the host for the new data source.
Look at the different log files that can be created and determine which log file needs to be parsed. This is typically the access.log, but your data source might use a different name.
sudo su - cd /var/log/$NEW_DATASOURCE ls
Generate entries for the log that needs to be parsed so you can see the format of the entries.
For example:
timestamp | time elapsed | remotehost | code/status | bytes | method | URL rfc931 peerstatus/peerhost | type
Create a Kafka topic for the new data source:
Log in to $KAFKA_HOST as root.
Create a Kafka topic named the same as the new data source:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --create --topic $NEW_DATASOURCE --partitions 1 --replication-factor 1
List all of the Kafka topics, to ensure that the new topic exists:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --list
Create a Grok statement.
Define the Grok expression for the log type you identified in Step 1 by creating a Grok statement file.
Refer to the Grok documentation for additional details.
Validate the Grok pattern to make sure it is valid.
You can use a tool such as Grok Constructor to validate your Grok pattern.
Save the Grok pattern and load it into Hadoop Distributed File System (HDFS) in a named location:
Create a local file for the new data source:
touch /tmp/$DATASOURCE
Open $DATASOURCE and add the Grok pattern defined in Step 3b:
vi /tmp/$DATASOURCE
Put the $DATASOURCE file into the HDFS directory where Metron stores its Grok parsers.
Existing Grok parsers that ship with HCP are staged under
/apps/metron/patterns
:su - hdfs hadoop fs -rmr /apps/metron/patterns/$DATASOURCE hdfs dfs -put /tmp/$DATASOURCE /apps/metron/patterns/
Define a parser configuration for the Metron Parsing Topology.
After the Grok pattern is staged in HDFS, you must define a parser configuration for the Metron Parsing Topology. The Metron Parsing Topology (also known as the Normalizing Topology) is designed to take a sensor input (in its native format) and turn it into a Metron JSON Object. For more information about the Metron parsing topology, see Parsers.
ssh as root into host with HCP installed.
Create a $DATASOURCE parser configuration file at
$METRON_HOME/config/zookeeper/parsers/$DATASOURCE.json
:For example: *** Please check out the following syntax. Is the placement for each field correct? ***
{ "parserClassName": "org.apache.metron.parsers.GrokParser", "readMetadata" : true "mergeMetadata" : true "metron.metadata.topic : topic" "metron.metadata.customer_id : "my_customer_id" "filterClassName" : "STELLAR" ,"parserConfig" : { "filter.query" : "exists(field1)" "sensorTopic": "$DATASOURCE", "parserConfig": { "grokPath": "/apps/metron/patterns/$DATASOURCE", "patternLabel": "$DATASOURCE_DELIMITED", "timestampField": "timestamp" }, "fieldTransformations" : [ { ere "transformation" : "STELLAR" ,"output" : [ "full_hostname", "domain_without_subdomains" ] ,"config" : { "full_hostname" : "URL_TO_HOST(url)" ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)" } } ] }
Where:
- parserClassName
The name of the parser's class that is in the jar file.
- readMetadata
A Boolean indicating whether or not to read metadata and make it available to field transformations (
false
by default).There are two types of metadata supported in HCP:
Environmental metadata : Metadata about the system at large
For example, if you have multiple Kafka topics being processed by one parser, you might want to tag the messages with the Kafka topic.
Custom metadata: Custom metadata from an individual telemetry source that you might want to use within Metron.
- mergeMetadata
Boolean indicating whether to merge metadata with the message or not (
false
by default).If this property is set to
True
, then every metadata field will become part of the messages and, consequently, also available for use in field transformations.- filterClassName
The filter to use. This may be a fully qualified classname of a Class that implements the
org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>
interface. Message Filters are intended to allow the user to ignore a set of messages via custom logic. The existing implementations are:STELLAR
: Allows you to apply a stellar statement which returns a boolean, which will pass every message for which the statement returnstrue
. The Stellar statement that is to be applied is specified by thefilter.query
property in theparserConfig
. Example Stellar Filter which includes messages which contain afield1
field:{ "filterClassName" : "STELLAR" ,"parserConfig" : { "filter.query" : "exists(field1)" } }
- sensorTopic
The Kafka topic on which the telemetry is being streamed.
- parserConfig
The configuration file.
- grokPath
The path for the Grok statement.
- patternLabel
The top-level pattern of the Grok file.
- fieldTransformations
An array of complex objects representing the transformations to be done on the message generated from the parser before writing out to the Kafka topic.
In this example, the Grok parser is designed to extract the URL, but the only information that you need is the domain (or even the domain without subdomains). To obtain this, you can use the Stellar Field Transformation (under the fieldTransformations element). The Stellar Field Transformation allows you to use the Stellar DSL (Domain Specific Language) to define extra transformations to be performed on the messages flowing through the topology. For more information on using the fieldTransformations element in the parser configuration, see Parsers.
Use the following script to upload configurations to Apache ZooKeeper:
$METRON_HOME/bin/zk_load_configs.sh --mode PUSH -i $METRON_HOME/config/zookeeper -z $ZOOKEEPER_HOST:2181
Note You might receive the following warning messages when you execute the previous command. You can safely ignore these warning messages.
log4j:WARN No appenders could be found for logger (org.apache.curator.framework.imps.CuratorFrameworkImpl). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Deploy the new parser topology to the cluster:
Log in to the host that has Metron installed as root user.
Deploy the new parser topology:
$METRON_HOME/bin/start_parser_topology.sh -k $KAFKA_HOST:6667 -z $ZOOKEEPER_HOST:2181 -s $DATASOURCE
Use the Apache Storm UI to ensure that the new topology is listed and that it has no errors.
This new data source processor topology ingests from $DATASOURCE Kafka topic that you created earlier and then parses the event with the HCP Grok framework using the Grok pattern defined earlier. The result of the parsing is a standard JSON Metron structure that is added to the enrichment Kafka topic for further processing.