ExtractMediaMetadata

Description:

Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library's website at http://tika.apache.org/.

Tags:

media, file, format, metadata, audio, video, image, document, pdf

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Max Number of AttributesMax Number of Attributes100Specify the max number of attributes to add to the flowfile. There is no guarantee in what order the tags will be processed. By default it will process all of them.
Max Attribute LengthMax Attribute Length100Specifies the maximum length of a single attribute value. When a metadata item has multiple values, they will be merged until this length is reached and then ", ..." will be added as an indicator that additional values where dropped. If a single value is longer than this, it will be truncated and "(truncated)" appended to indicate that truncation occurred.
Metadata Key FilterMetadata Key FilterA regular expression identifying which metadata keys received from the parser should be added to the flowfile attributes. If left blank, all metadata keys parsed will be added to the flowfile attributes.
Metadata Key PrefixMetadata Key PrefixText to be prefixed to metadata keys as the are added to the flowfile attributes. It is recommended to end with with a separator character like '.' or '-', this is not automatically added by the processor.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

NameDescription
successAny FlowFile that successfully has media metadata extracted will be routed to success
failureAny FlowFile that fails to have media metadata extracted will be routed to failure

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
<Metadata Key Prefix><attribute>The extracted content metadata will be inserted with the attribute name "<Metadata Key Prefix><attribute>", or "<attribute>" if "Metadata Key Prefix" is not provided.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.