kite-morphlines-tika-core
This maven module contains morphline commands for autodetecting MIME types from binary data. Depends on tika-core.
detectMimeType
The detectMimeType command (source code) uses Apache Tika to autodetect the
          MIME type of the first attachment from the binary data. The
        detected MIME type is assigned to the _attachment_mimetype field.
The command provides the following configuration options:
| Property Name | Default | Description | 
|---|---|---|
| includeDefaultMimeTypes | true | Whether to include the Tika default MIME types file that ships embedded in tika-core.jar (see http://github.com/apache/tika/blob/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml) | 
| mimeTypesFiles | [] | The relative or absolute path of zero or more Tika custom-mimetypes.xml files to include. | 
| mimeTypesString | null | The content of an optional custom-mimetypes.xml file embedded directly inside of this morphline configuration file. | 
| preserveExisting | true | Whether to preserve the _attachment_mimetype field value if one is already present. | 
| includeMetaData | false | Whether to pass the record fields to Tika to assist in MIME type detection. | 
| excludeParameters | true | Whether to remove MIME parameters from output MIME type. | 
Example usage:
detectMimeType {
  includeDefaultMimeTypes : false
  #mimeTypesFiles : [src/test/resources/custom-mimetypes.xml]
  mimeTypesString :
    """
    <mime-info>
      <mime-type type="text/space-separated-values">
        <glob pattern="*.ssv"/>
      </mime-type>
      <mime-type type="avro/binary">
        <magic priority="50">
          <match value="0x4f626a01" type="string" offset="0"/>
        </magic>
        <glob pattern="*.avro"/>
      </mime-type>
      <mime-type type="mytwittertest/json+delimited+length">
        <magic priority="50">
          <match value="[0-9]+(\r)?\n\\{"" type="regex" offset="0:16"/>
        </magic>
      </mime-type>
    </mime-info>
    """
}
    