Creating a flow definition in NiFi
Before you can run a data flow in Cloudera DataFlow, you need to (1) create the flow
in Apache NiFi, (2) download the NiFi flow as a flow definition, (3) import it to Cloudera
DataFlow and finally, (4) deploy the flow. The flow definition acts as a configuration logic for
your flow deployments. It enables you to deploy your data flow without the need to maintain
cluster infrastructure. Also, you can deploy the same flow to multiple environments in Cloudera
When you want to develop a NiFi flow that you intend to use in Cloudera DataFlow, review and adjust your traditional NiFi flow development process to make sure that you can create portable data flows that will work in the Cloudera DataFlow environment. Before you get started with flow development, it is useful to understand where you need special attention and what adjustments you have to make in your development workflow.
You can create and download flow definitions starting with version 1.11 of Apache NiFi. Cloudera provides the following Apache NiFi based products:
- Cloudera DataFlow for Data Hub
- CFM 1.1.0 and higher
- HDF 3.5
Cloudera recommends that you develop your flow definitions using CDP Data Hub Flow Management clusters. For more information on how to set up a managed and secured Flow Management cluster in CDP Public Cloud, see Setting up your Flow Management cluster.
For more information on planning and preparing your NiFi flows for Cloudera DataFlow, see Best Practices for Developing Flow Definitions.
- Create a process group that will contain your NiFi flow.
- Drag and drop the process group icon onto the canvas.
- Add a name for the process group.Once you have the new process group available on the canvas, you can interact with it by right-clicking it and selecting an option from the context menu. The available options vary depending on the privileges assigned to you.
- Enter the process group by double-clicking it.Alternatively, you can select the Enter group option from the context menu.
- Add the appropriate flow components to the NiFi canvas.You can add processors and other components to build your data flow. To add a processor to your flow, drag the processor icon to the canvas and select the name of the processor from the list.
- Configure the components in your data flow.Make sure that you externalize the component properties where values change depending on the environment in which the data flow is running. For more information on parameterizing your processor configurations, see Best Practices for Flow Definition Development.
- Create a parameter context for your data flow and add parameters to
Using parameters for certain properties (for example, connection information, truststores, or drivers) makes the flow portable. You can download your data flow from NiFi and then import it to Cloudera DataFlow as a flow definition. When you deploy the flow definition, you can specify values or upload files for these parameters in the deployment wizard to adjust them to your needs.
For instructions on how to set up your parameters, see Best Practices for Flow Definition Development.
- Create controller services for your flow.
If you want to add controller services to your flow that you will later use in Cloudera Dataflow, you must define the controller services in the configuration of the process group that you will download as a flow definition. In this case the services will be available to all processors in that process group and will be available in Cloudera DataFlow as well, when you import your flow definition.
For more information and instructions on how to set up your controller services, see Best Practices for Flow Definition Development.
- Configure each processor (and any other components) in your flow with the
required values by double-clicking it.Alternatively, you can right-click the processor and select the Configure option from the processor’s context menu.
Parameterize component properties and use controller services in the configuration where needed. You can use the parameters you previously created or you can also create new parameters as you configure the components in your flow. To create a new parameter for a property, select the (Convert to Parameter) icon in the property’s row.
- After configuring the processors and other data flow components, click Apply.
- Create a parameter context for your data flow and add parameters to it.
- Connect the components in the data flow and configure the connections.
- Check your data flow to make sure that configuration of all components and connections is valid.