6. Late Data Handling

Late data handling in Falcon defines how long data can be delayed and how that late data is handled. For example, a late arrival cut-off of hours(6) in the feed entity means that data for the specified hour can delay as much as 6 hours later. The late data specification in the process entity defines how this late data is handled. The late data policy in the process entity defines how frequently Falcon checks for late data.

The supported policies for late data handling are:

  • backoff: Take the maximum late cut-off and check every specified time.

  • exp-backoff (default): Recommended. Take the maximum cut-off date and check on an exponentially determined time.

  • final:Take the maximum late cut-off and check once.

The policy, along with delay, defines the interval at which late data check is done. Late input specification for each input defines the workflow that should run when late data is detected for that input.

To handle late data, you need to modify the feed and process entities.

  1. Specify the cut-off time in your feed entity.

    For example, to set a cut-off of 4 hours:

    <late-arrival cut-off="hours(4)”/>
  2. Specify a check for late data in all your process entities that reference that feed entity.

    For example, to check each hour until the cut-off time with a specified policy of backoff and a delay of 1 hour:

    <late-process policy="exp-backoff" delay="hours(1)”>
       <late-input input="input" workflow-path="/apps/clickstream/late" />
    </late-process>