Configuring the processor for your data source

Learn how to configure the GenerateFlowFile data source processor for the CDW Iceberg ingest data flow.

You can set up a data flow to move data in Iceberg table format into Cloudera Data Warehouse (CDW) from many different locations. This example assumes that you are using sample data generated by the GenerateFlowFile processor.

  1. Launch the Configure Processor window by right clicking the GenerateFlowFile processor and selecting Configure. A configuration dialog with the following tabs is displayed: Settings, Scheduling, Properties, and Comments.
  2. Configure the processor according to the behavior you expect in your data flow.
    The GenerateFlowFile processor can create many FlowFiles very quickly. Setting the run schedule to a reasonable value is important so that the flow does not overwhelm the system.
  3. When you have finished configuring the options you need, save the changes by clicking the Apply button.
    Make sure that you set all required properties, because you cannot start the processor until all mandatory properties have been configured.

The following settings and properties are used in this example:

Table 1. GenerateFlowFile processor scheduling
Scheduling Description Example value for ingest data flow

Run Schedule

Run schedule dictates how often the processor should be scheduled to run. The valid values for this field depend on the selected Scheduling Strategy.

500 ms

Table 2. GenerateFlowFile processor properties
Description Example value for ingest data flow

Custom text

If Data Format is text and if Unique FlowFiles is false, you can provide custom to be used as the content of the generated FlowFiles.

The expression statement in the example value generates a sequential ID, the current timestamp and some hard coded name and country code.

[
  {
    "id": ${nextInt()},
    "name": "Ram",
    "created_at": "${now():format("yyyy-MM-dd HH:mm:ss.SSS", "GMT")}",
    "country_code": "FR"
  },
  {
    "id": ${nextInt()},
    "name": "Bob",
    "created_at": "${now():format("yyyy-MM-dd HH:mm:ss.SSS", "GMT")}",
    "country_code": "UK"
  }
]
Configure the processor for your data target.