Building your dataflow

Learn how to create a NiFi dataflow that processes Mainframe / EBCDIC encoded data.

The basic steps of setting up the elements of your dataflow in NiFi involve opening NiFi in a CDP Public Cloud Flow Management cluster, adding processors to your NiFi canvas, and connecting the processors.

  1. Open NiFi 
in 
CDP Public Cloud.
    1. To access the NiFi service in your Flow Management cluster, navigate to Management Console service > Data Hub Clusters.
    2. Click the tile representing the Flow Management Data Hub cluster that you want to work with.
    3. Click the NiFi icon in the Services section of the cluster overview page to access the NiFi UI.
      You will be logged into NiFi automatically with your CDP credentials.
  2. Add and configure the GetFile processor for data input.
    This processor creates FlowFiles from files in a directory.
    1. Drag and drop the processor icon into the canvas.
      This displays a dialog that allows you to choose the processor you want to add.
    2. Use the Add Processor filter box to search for the GetFile processor and click Add.
    You can use other processors (for example SFTP processors) to retrieve offloaded binary files from a given location.
  3. Add and configure the ConvertRecord processor for data conversion.
    This processor converts records from one data format to another using Record Reader and Record Write Controller Services. You can use it to transform your EBCDIC encoded data into JSON or other structured formats. Configure it with the EBCDICRecordReader previously described and a default JSON writer.
  4. Add and configure the PutFile processor for data output.
    You can replace PutFile with any other sink processor appropriate for your use case (for example ingesting data into Iceberg, CDW, object store).
  5. Connect the processors to create the flow.
    1. Drag the connection icon from the one processor, and drop it on the next processor.
    2. Configure the connection.
    3. Click Add to close the dialog box and add the connection to your flow.
Your dataflow may resemble the following:

When this flow is running and receives a file from the Mainframe, it converts the data to JSON format, with a result that would look like the following: