kite-morphlines-json

This maven module contains morphline commands for reading, extracting, and transforming JSON files and JSON objects.

readJson

The readJson command (source code) parses an InputStream or byte array that contains JSON data, using the Jackson library. For each top level JSON object, the command emits a morphline record containing the top level object as an attachment in the field _attachment_body.

The input stream or byte array is read from the first attachment of the input record.

The command provides the following configuration options:

Property Name Default Description
outputClass com.fasterxml.jackson.databind.JsonNode The fully qualified name of a Java class that Jackson shall convert to.

Example usage:

readJson {}

Example usage with conversion from JSON to java.util.Map objects:

readJson {
  outputClass : java.util.Map
}

extractJsonPaths

The extractJsonPaths command (source code) extracts specific values from a JSON object, akin to a simple form of XPath. The command uses zero or more JSON path expressions to extract values from a Jackson JSON object of outputClass com.fasterxml.jackson.databind.JsonNode.

The JSON input object is expected to be contained in the field _attachment_body, and typically placed there by an upstream readJson command with outputClass : com.fasterxml.jackson.databind.JsonNode.

Each path expression consists of a record output field name (on the left side of the colon ':') as well as zero or more path steps (on the right hand side), each path step separated by a '/' slash, akin to a simple form of XPath. JSON arrays are traversed with the '[]' notation.

The result of a path expression is a list of objects, each of which is added to the given record output field.

The path language supports all JSON concepts, including such concepts as nested objects, arrays, etc. The path language supports a flatten option that collects the primitives in a subtree into a flat output list.

The command provides the following configuration options:

Property Name Default Description
flatten true Whether to collect the primitives in a subtree into a flat output list.
paths [] Zero or more JSON path expressions.

Example usage:

extractJsonPaths {
  flatten : true
  paths : {
    my_price : /price

    my_docId : /docId
    my_links : /links
    my_links_backward : "/links/backward"
    my_links_forward : "/links/forward"
    my_name_language_code : "/name[]/language[]/code"
    my_name_language_country : "/name[]/language[]/country"
    my_name : /name    
  }
}

Alternatively, if the extractJsonPaths command doesn't fit your needs you can instead implement your own Custom Morphline Command or script a java command config that uses the com.fasterxml.jackson.databind.JsonNode Java API to arbitrarily traverse and process the Jackson Json tree that is emitted by the readJson command. For example, along the following lines:

{ 
  readJson { }
}

{
  java { 
    imports : """
      import com.fasterxml.jackson.databind.JsonNode;
      import org.kitesdk.morphline.base.Fields;
      // import com.cloudera.cdk.morphline.base.Fields; // use this for CDK
    """
    code : """
      JsonNode rootNode = (JsonNode) record.getFirstValue(Fields.ATTACHMENT_BODY);
      String forwardLinks = rootNode.get("links").get("forward").asText(); // traverse via Jackson Tree API
      record.put("forwardLinks", forwardLinks);
      logger.debug("My output record: {}", record);
      return child.process(record);
    """
    } 
  }
}