General Purpose Parsers
The general purpose parser is primarily designed for lower-velocity topologies or for quickly setting up a temporary parser for a new telemetry. General purpose parsers are defined using a config file, and you need not recompile the topology to change them. HCP supports two general purpose parsers: Grok and CSV.
Grok parser
The Grok parser class name (parserClassName) is
org.apache.metron,parsers.GrokParser
.
Grok has the following entries and predefined patterns for
parserConfig
:
grokPath
The patch in HDFS (or in the Jar) to the Grok statement
patternLabel
The pattern label to use from the Grok statement
timestampField
The field to use for timestamp
timeFields
A list of fields to be treated as time
dateFormat
The date format to use to parse the time fields
timezone
The timezone to use.
UTC
is the default.
CSV Parser
The CSV parser class name (parserClassName) is org.apache.metron,parsers.csv.CSVParser
CSV has the following entries and predefined patterns for
parserConfig
:
timestampFormat
The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is ms since UNIX epoch.
columns
A map of column names you wish to extract from the CSV to their offsets. For example,
{ 'name' : 1,'profession' : 3}
would be a column map for extracting the 2nd and 4th columns from a CSV.separator
The column separator. The default value is ",".