Considerations when developing flow definitions

As you are building your flow definition in your NiFi development environment, you should build the flow definition with ease of CDF export and isolation in mind.

Controller Services

In traditional NiFi data flows, Controller Services are considered shared services and often used by multiple data flows, or by multiple Processors in different Process Groups. In CDF, you should plan for the possibility that you will run your process group in isolation. This means that Controller Services must be defined within the Process Group that you download as a flow definition. Controller Services that are defined outside of your Process Group are not exported along with the flow definition.

When you are preparing to download a Process Group as a flow definition, review the Controller Services defined outside your Process Group and if needed recreate them within the Process Group.

Parameterize Processor and Controller Service configurations

Ensure that your Process Group is portable by parameterizing your processor and controller services configurations. This allows you to deploy a flow in different environments without having to update Processor configuration details.

Customize your Processor and Connection names

To ensure that you are able to distinguish Processors and Connections when defining KPIs in your CDF flow deployment ensure that you specify a custom name for them when developing your data flow.

For example:

Review Reporting Tasks

Reporting tasks are not exported to CDF when you are exporting your flow definition. When you are designing your flow definition, you should be aware of your monitoring and report needs and plan to address these needs with KPIs and Alerts specified as part of your flow deployment.

Using the CDPEnvironment parameter to get Hadoop configuration resources

DataFlow makes it easy to use HDFS processors to read/write data from/to S3. Using HDFS processors to access data in S3 allows you to leverage CDP’s IDBroker authentication method and doesn’t require you to specify any S3 secrets in the processor configuration. HDFS processors however require Hadoop configuration files - specifically the core-site.xml file. CDF offers a special parameter called CDPEnvironment for you to use whenever you are working with processors that require Hadoop configuration files. Simply use the parameter #{CDPEnvironment} for the Hadoop configuration file property and CDF will automatically obtain the required files during the flow deployment process.

Using CDPEnvironment for the Hadoop Configuration Resources property.

The DataFlow Deployment Wizard detects usage of CDPEnvironment parameter and automatically obtains the required Hadoop configuration files.

Prepare Default NiFi SSL Context Service in Flow Management Data Hub clusters for CDF

If you have been developing your data flow using a Flow Management Data Hub cluster and your data flow uses the Default NiFi SSL Context Service, you need to create a new SSL Context Service and parameterize it before exporting your data flow.

A ConsumeKafkaRecord processor utilizing the Default NiFi SSL Context Service in a Flow Management Data Hub cluster

  1. Create a new StandardRestrictedSSLContextService in the process group you want to export
  2. Parameterize the Truststore Filename and Truststore Password properties
  3. Select JKS as Truststore Type and TLS as TLS protocol

The new StandardRestrictedSSLContextService with using parameters.

When deploying this flow definition in DataFlow, the deployment wizard will ask you to upload the trustore file and specify its password.

Before you can deploy the flow definition, you therefore need to create a truststore using the root certificate of the CDP environment you want to connect to.

To do this follow these steps:

  1. In the CDP Management console navigate to the CDP environment you want to connect to. Download a certificate from your environment. Select the Summary tab, scroll down to the FreeIPA section and select Get FreeIPA certificate from the actions menu. This will download the FreeIPA certificate to your computer.
  2. With the root certificate downloaded you can now create the truststore. Run the following command and change the storepass value to a different password:
    
    keytool \
     -importcert \
     -noprompt \
     -storetype JKS \
     -keystore truststore.jks \
     -storepass changeit \
     -alias freeipa-ca \
     -file /path/to/<environment_name>.crt

You now have a truststore file that you can upload when deploying a flow definition using the Flow Deployment Wizard

Resource consumption

When deciding how to export your process groups for deployment on DFX, review the data flow resource consumption to help you decide how to isolate your data flows within CDF.

Once you have developed your data flow in a NiFi development environment, you start to export your flow definition

  • Root canvas

  • Whole process group level

  • Part of the process group

Inter process group communication

If you have process groups that exchange data between each other, you should treat them as one flow definition and therefore download their parent process group as a flow definition.