Late Data Handling
Late data handling in Falcon defines how long data can be delayed and how that late
data is handled. For example, a late arrival cut-off of hours(6)
in the feed entity
means that data for the specified hour can delay as much as 6 hours later. The late data
specification in the process entity defines how this late data is handled. The late data
policy in the process entity defines how frequently Falcon checks for late data.
The supported policies for late data handling are:
backoff: Take the maximum late cut-off and check every specified time.
exp-backoff (default): Recommended. Take the maximum cut-off date and check on an exponentially determined time.
final:Take the maximum late cut-off and check once.
The policy, along with delay, defines the interval at which late data check is done. Late input specification for each input defines the workflow that should run when late data is detected for that input.
To handle late data, you need to modify the feed and process entities.
Specify the cut-off time in your feed entity.
For example, to set a cut-off of 4 hours:
<late-arrival cut-off="hours(4)”/>
Specify a check for late data in all your process entities that reference that feed entity.
For example, to check each hour until the cut-off time with a specified policy of
backoff
and a delay of 1 hour:<late-process policy="exp-backoff" delay="hours(1)”> <late-input input="input" workflow-path="/apps/clickstream/late" /> </late-process>