Make sure that you have DFDeveloper permission to perform this task. For
information on account and resource roles, see Cloudera Data Flow Authorization.
This feature will not provide provenance data if provenance events have not been
generated. For provenance events to be generated, you need an active test
session.
Data provenance provides a searchable historical record of every data object
(FlowFile) as it moves through your flow. It can help you debug and audit issues
with your flows by understanding the changes (events) that occurred to data as it
was being processed by each processor.
To view data provenance events in Flow Designer, go to Flow Options > Data
Provenance
Find provenance events relevant to your use case.
To locate all events related to a component, Search by component
name.
You can filter for events by Event Type and
Component Type.
To further narrow your search, click Advanced
search.
Advanced search allows you to include or
exclude the following types of properties from your search:
Event Type
FlowFile UUID
Filename
Component ID
Relationship
You can also select the timeframe within which to search and the
size of the FlowFile using the following:
Date Range
Start Time
End Time
Minimum File Size
Maximum File Size
To clear all selected filtering options, click Clear all.
You can directly access data provenance of a specific processor by right
clicking a processor and selecting View Data
Provenance from the context menu.
To view event details click the [Event Details] icon on
the right of an event row.
It opens the Provenance Event details pane that also
includes the Attributes and
Content sections.
The Attributes section shows the attributes that exist
on the FlowFile as of that point in the flow. To only view the attributes
that were modified as a result of the processing event, select the
Show modified attributes only checkbox.
The Content section shows information about the
FlowFile’s content, such as its location in the Content Repository and its size.
Click the
Download button to
download a copy of the FlowFile’s content as it existed at this point in
the flow.
Click the Replay button to
replay the FlowFile at this point in the flow.
Upon clicking
Replay, the
FlowFile is sent to the connection feeding the component that
produced this processing event. To inspect the contents of a
FlowFile at some point in the flow to ensure that it is being
processed as expected. If it is not being processed properly, you
may need to make adjustments to the flow and replay the FlowFile
again. You can achieve this from the Content
section of the Provenance Event pane.