General Purpose Parsers
The general-purpose parser is primarily designed for lower-velocity topologies or for quickly setting up a temporary parser for a new telemetry.
General purpose parsers are defined using a config file, and you need not recompile the topology to change them. HCP supports two general purpose parsers: Grok and CSV.
Grok parser
The Grok parser class name (parserClassName) is
org.apache.metron,parsers.GrokParser
.
Grok has the following entries and predefined patterns for
parserConfig
:
-
grokPath
-
The patch in HDFS (or in the Jar) to the Grok statement
-
patternLabel
-
The pattern label to use from the Grok statement
-
timestampField
-
The field to use for timestamp
-
timeFields
-
A list of fields to be treated as time
-
dateFormat
-
The date format to use to parse the time fields
-
timezone
-
The timezone to use.
UTC
is the default.
CSV Parser
The CSV parser class name (parserClassName) is
org.apache.metron.parsers.csv.CSVParser
CSV has the following entries and predefined patterns for
parserConfig
:
-
timestampFormat
-
The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is starts at UNIX epoch.
-
columns
-
A map of column names you wish to extract from the CSV to their offsets. For example,
{ 'name' : 1,'profession' : 3}
would be a column map for extracting the 2nd and 4th columns from a CSV. -
separator
-
The column separator. The default value is ",".
JSON Map Parser
The JSON parser class name (parserClassName) is
org.apache.metron.parsers.csv.JSONMapParser
JSON has the following entries and predefined patterns for
parserConfig
:
- mapStrategy
-
A strategy to indicate how to handle multi-dimensional Maps. This is one of:
-
DROP
-
Drop fields which contain maps
-
UNFOLD
-
Unfold inner maps. So
{ "foo" : { "bar" : 1} }
would turn into{"foo.bar" : 1}
-
ALLOW
-
Allow multidimensional maps
-
ERROR
-
Throw an error when a multidimensional map is encountered
-
-
timestamp
-
This field is expected to exist and, if it does not, then current time is inserted.
- jsonQuery
- If this JSON query string is present, the result of the query will be a list of messages. This is useful if you have a JSON document that contains a list or array of messages embedded in it, and you do not have another means of splitting the message.