Morphline commands overview
Morphlines provides a set of frequently-used high-level transformation and I/O commands that can be combined in application specific ways. This is a short description of each available command and a link to the complete documentation.
kite-morphlines-core-stdio
- readBlob
- Converts a byte stream to a byte array in main memory.
- readClob
-
Converts a byte stream to a string.
- readCSV
-
Extracts zero or more records from the input stream of bytes representing a Comma Separated Values (CSV) file.
- readLine
-
Emits one record per line in the input stream.
- readMultiLine
-
Log parser that collapses multiple input lines into a single record, based on regular expression pattern matching.
kite-morphlines-core-stdlib
- addCurrentTime
-
Adds the result of System.currentTimeMillis() to a given output field.
- addLocalHost
-
Adds the name or IP of the local host to a given output field.
- addValues
- Adds a list of values (or the contents of another field) to a given field.
- addValuesIfAbsent
-
Adds a list of values (or the contents of another field) to a given field if not already contained.
- callParentPipe
-
Implements recursion for extracting data from container data formats.
- contains
-
Returns whether or not a given value is contained in a given field.
- convertTimestamp
-
Converts the timestamps in a given field from one of a set of input date formats to an output date format.
- decodeBase64
-
Converts a Base64 encoded String to a byte[].
- dropRecord
-
Silently consumes records without ever emitting any record. Think
/dev/null
. - equals
-
Succeeds if all field values of the given named fields are equal to the the given values and fails otherwise.
- extractURIComponents
-
Extracts subcomponents such as host, port, path, query, etc from a URI.
- extractURIComponent
-
Extracts a particular subcomponent from a URI.
- extractURIQueryParameters
-
Extracts the query parameters with a given name from a URI.
- findReplace
-
Examines each string value in a given field and replaces each substring of the string value that matches the given string literal or grok pattern with the given replacement.
- generateUUID
-
Sets a universally unique identifier on all records that are intercepted.
- grok
-
Uses regular expression pattern matching to extract structured fields from unstructured log or text data.
- head
-
Ignores all input records beyond the N-th record, akin to the Unix
head
command. - if
-
Implements if-then-else conditional control flow.
- java
-
Scripting support for Java. Dynamically compiles and executes the given Java code block.
- logTrace, logDebug, logInfo, logWarn, logError
-
Logs a message at the given log level to SLF4J.
- not
-
Inverts the boolean return value of a nested command.
- pipe
-
Pipes a record through a chain of commands.
- removeFields
-
Removes all record fields for which the field name matches a blacklist but not a whitelist.
- removeValues
- Removes all record field values for which the field name and value matches a blacklist but not a whitelist.
- replaceValues
-
Replaces all record field values for which the field name and value matches a blacklist but not a whitelist.
- sample
-
Forwards each input record with a given probability to its child command.
- separateAttachments
-
Emits one separate output record for each attachment in the input record's list of attachments.
- setValues
-
Assigns a given list of values (or the contents of another field) to a given field.
- split
-
Divides a string into substrings, by recognizing a separator (a.k.a. "delimiter") which can be expressed as a single character, literal string, regular expression, or grok pattern.
- splitKeyValue
-
Splits key-value pairs where the key and value are separated by the given separator, and adds the pair's value to the record field named after the pair's key.
- startReportingMetricsToCSV
-
Starts periodically appending the metrics of all commands to a set of CSV files.
- startReportingMetricsToJMX
-
Starts publishing the metrics of all commands to JMX.
- startReportingMetricsToSLF4J
-
Starts periodically logging the metrics of all morphline commands to SLF4J.
- toByteArray
-
Converts a String to the byte array representation of a given charset.
- toString
-
Converts a Java object to it's string representation; optionally also removes leading and trailing whitespace.
- translate
-
Replace a string with the replacement value defined in a given dictionary aka lookup hash table.
- tryRules
-
Simple rule engine for handling a list of heterogeneous input data formats.
kite-morphlines-avro
- readAvroContainer
-
Parses an Apache Avro binary container and emits a morphline record for each contained Avro datum.
- readAvro
-
Parses containerless Avro and emits a morphline record for each contained Avro datum.
- extractAvroTree
-
Recursively walks an Avro tree and extracts all data into a single morphline record.
- extractAvroPaths
-
Extracts specific values from an Avro object, akin to a simple form of XPath.
- toAvro
-
Converts a morphline record to an Avro record.
- writeAvroToByteArray
-
Serializes Avro records into a byte array.
kite-morphlines-json
- readJson
-
Parses JSON and emits a morphline record for each contained JSON object, using the Jackson library.
- extractJsonPaths
-
Extracts specific values from a JSON object, akin to a simple form of XPath.
kite-morphlines-hadoop-core
- downloadHdfsFile
-
Downloads, on startup, zero or more files or directory trees from HDFS to the local file system.
- openHdfsFile
-
Opens an HDFS file for read and returns a corresponding Java InputStream.
kite-morphlines-hadoop-parquet-avro
- readAvroParquetFile
-
Parses a Hadoop Parquet file and emits a morphline record for each contained Avro datum.
kite-morphlines-hadoop-rcfile
- readRCFile
-
Parses an Apache Hadoop RCFile and emits morphline records row-wise or column-wise.
kite-morphlines-hadoop-sequencefile
- readSequenceFile
-
Parses an Apache Hadoop SequenceFile and emits a morphline record for each contained key-value pair.
kite-morphlines-maxmind
- geoIP
-
Returns Geolocation information for a given IP address, using an efficient in-memory Maxmind database lookup.
kite-morphlines-metrics-servlets
- registerJVMMetrics
-
Registers metrics that are related to the Java Virtual Machine with the MorphlineContext.
- startReportingMetricsToHTTP
-
Exposes liveness status, health check status, metrics state and thread dumps via a set of HTTP URLs served by Jetty, using the AdminServlet.
kite-morphlines-protobuf
- readProtobuf
-
Parses an InputStream that contains protobuf data and emits a morphline record containing the protobuf object as an attachment.
- extractProtobufPaths
-
Extracts specific values from a protobuf object, akin to a simple form of XPath.
kite-morphlines-tika-core
- detectMimeType
-
Uses Apache Tika to autodetect the MIME type of binary data.
kite-morphlines-tika-decompress
- decompress
-
Decompresses gzip and bzip2 format.
- unpack
Unpacks tar, zip, and jar format.
kite-morphlines-saxon
kite-morphlines-solr-core
- solrLocator
-
Specifies a set of configuration parameters that identify the location and schema of a Solr server or SolrCloud.
- loadSolr
-
Inserts, updates or deletes records into a Solr server or MapReduce Reducer.
- generateSolrSequenceKey
-
Assigns a unique key that is the concatenation of a field and a running count of the record number within the current session.
- sanitizeUnknownSolrFields
-
Removes record fields that are unknown to Solr
schema.xml
, or moves them to fields with a given prefix. - tokenizeText
-
Uses the embedded Solr/Lucene Analyzer library to generate tokens from a text string, without sending data to a Solr server.
kite-morphlines-solr-cell
kite-morphlines-useragent
- userAgent
-
Parses a user agent string and returns structured higher level data like user agent family, operating system, version, and device type.