Morphline commands overview

Morphlines provides a set of frequently-used high-level transformation and I/O commands that can be combined in application specific ways. This is a short description of each available command and a link to the complete documentation.

kite-morphlines-core-stdio

readBlob
Converts a byte stream to a byte array in main memory.
readClob

Converts a byte stream to a string.

readCSV

Extracts zero or more records from the input stream of bytes representing a Comma Separated Values (CSV) file.

readLine

Emits one record per line in the input stream.

readMultiLine

Log parser that collapses multiple input lines into a single record, based on regular expression pattern matching.

kite-morphlines-core-stdlib

addCurrentTime

Adds the result of System.currentTimeMillis() to a given output field.

addLocalHost

Adds the name or IP of the local host to a given output field.

addValues
Adds a list of values (or the contents of another field) to a given field.
addValuesIfAbsent

Adds a list of values (or the contents of another field) to a given field if not already contained.

callParentPipe

Implements recursion for extracting data from container data formats.

contains

Returns whether or not a given value is contained in a given field.

convertTimestamp

Converts the timestamps in a given field from one of a set of input date formats to an output date format.

decodeBase64

Converts a Base64 encoded String to a byte[].

dropRecord

Silently consumes records without ever emitting any record. Think /dev/null.

equals

Succeeds if all field values of the given named fields are equal to the the given values and fails otherwise.

extractURIComponents

Extracts subcomponents such as host, port, path, query, etc from a URI.

extractURIComponent

Extracts a particular subcomponent from a URI.

extractURIQueryParameters

Extracts the query parameters with a given name from a URI.

findReplace

Examines each string value in a given field and replaces each substring of the string value that matches the given string literal or grok pattern with the given replacement.

generateUUID

Sets a universally unique identifier on all records that are intercepted.

grok

Uses regular expression pattern matching to extract structured fields from unstructured log or text data.

head

Ignores all input records beyond the N-th record, akin to the Unix head command.

if

Implements if-then-else conditional control flow.

java

Scripting support for Java. Dynamically compiles and executes the given Java code block.

logTrace, logDebug, logInfo, logWarn, logError

Logs a message at the given log level to SLF4J.

not

Inverts the boolean return value of a nested command.

pipe

Pipes a record through a chain of commands.

removeFields

Removes all record fields for which the field name matches a blacklist but not a whitelist.

removeValues
Removes all record field values for which the field name and value matches a blacklist but not a whitelist.
replaceValues

Replaces all record field values for which the field name and value matches a blacklist but not a whitelist.

sample

Forwards each input record with a given probability to its child command.

separateAttachments

Emits one separate output record for each attachment in the input record's list of attachments.

setValues

Assigns a given list of values (or the contents of another field) to a given field.

split

Divides a string into substrings, by recognizing a separator (a.k.a. "delimiter") which can be expressed as a single character, literal string, regular expression, or grok pattern.

splitKeyValue

Splits key-value pairs where the key and value are separated by the given separator, and adds the pair's value to the record field named after the pair's key.

startReportingMetricsToCSV

Starts periodically appending the metrics of all commands to a set of CSV files.

startReportingMetricsToJMX

Starts publishing the metrics of all commands to JMX.

startReportingMetricsToSLF4J

Starts periodically logging the metrics of all morphline commands to SLF4J.

toByteArray

Converts a String to the byte array representation of a given charset.

toString

Converts a Java object to it's string representation; optionally also removes leading and trailing whitespace.

translate

Replace a string with the replacement value defined in a given dictionary aka lookup hash table.

tryRules

Simple rule engine for handling a list of heterogeneous input data formats.

kite-morphlines-avro

readAvroContainer

Parses an Apache Avro binary container and emits a morphline record for each contained Avro datum.

readAvro

Parses containerless Avro and emits a morphline record for each contained Avro datum.

extractAvroTree

Recursively walks an Avro tree and extracts all data into a single morphline record.

extractAvroPaths

Extracts specific values from an Avro object, akin to a simple form of XPath.

toAvro

Converts a morphline record to an Avro record.

writeAvroToByteArray

Serializes Avro records into a byte array.

kite-morphlines-json

readJson

Parses JSON and emits a morphline record for each contained JSON object, using the Jackson library.

extractJsonPaths

Extracts specific values from a JSON object, akin to a simple form of XPath.

kite-morphlines-hadoop-core

downloadHdfsFile

Downloads, on startup, zero or more files or directory trees from HDFS to the local file system.

openHdfsFile

Opens an HDFS file for read and returns a corresponding Java InputStream.

kite-morphlines-hadoop-parquet-avro

readAvroParquetFile

Parses a Hadoop Parquet file and emits a morphline record for each contained Avro datum.

kite-morphlines-hadoop-rcfile

readRCFile

Parses an Apache Hadoop RCFile and emits morphline records row-wise or column-wise.

kite-morphlines-hadoop-sequencefile

readSequenceFile

Parses an Apache Hadoop SequenceFile and emits a morphline record for each contained key-value pair.

kite-morphlines-maxmind

geoIP

Returns Geolocation information for a given IP address, using an efficient in-memory Maxmind database lookup.

kite-morphlines-metrics-servlets

registerJVMMetrics

Registers metrics that are related to the Java Virtual Machine with the MorphlineContext.

startReportingMetricsToHTTP

Exposes liveness status, health check status, metrics state and thread dumps via a set of HTTP URLs served by Jetty, using the AdminServlet.

kite-morphlines-protobuf

readProtobuf

Parses an InputStream that contains protobuf data and emits a morphline record containing the protobuf object as an attachment.

extractProtobufPaths

Extracts specific values from a protobuf object, akin to a simple form of XPath.

kite-morphlines-tika-core

detectMimeType

Uses Apache Tika to autodetect the MIME type of binary data.

kite-morphlines-tika-decompress

decompress

Decompresses gzip and bzip2 format.

unpack

Unpacks tar, zip, and jar format.

kite-morphlines-saxon

convertHTML

Converts any HTML to XHTML, using the TagSoup Java library.

xquery

Parses XML and runs the given W3C XQuery over it, using the Saxon Java library.

xslt

Parses XML and runs the given W3C XSL Transform over it, using the Saxon Java library.

kite-morphlines-solr-core

solrLocator

Specifies a set of configuration parameters that identify the location and schema of a Solr server or SolrCloud.

loadSolr

Inserts, updates or deletes records into a Solr server or MapReduce Reducer.

generateSolrSequenceKey

Assigns a unique key that is the concatenation of a field and a running count of the record number within the current session.

sanitizeUnknownSolrFields

Removes record fields that are unknown to Solr schema.xml, or moves them to fields with a given prefix.

tokenizeText

Uses the embedded Solr/Lucene Analyzer library to generate tokens from a text string, without sending data to a Solr server.

kite-morphlines-solr-cell

solrCell

Uses Apache Tika to parse data, then maps the Tika output back to a record using Apache SolrCell.

kite-morphlines-useragent

userAgent

Parses a user agent string and returns structured higher level data like user agent family, operating system, version, and device type.