Deploying a flow definition using the wizard

Deploy a flow definition to run Apache NiFi flows as flow deployments in Cloudera DataFlow. To do this, launch the Deployment wizard and specify your environment, parameters, sizing, and KPIs.

  • You have an enabled and healthy Cloudera DataFlow environment.

  • You have been assigned the DFCatalogAdmin or DFCatalogViewer role granting you access to the Catalog.

  • The flow definition you want to deploy has been added to the Catalog by someone with DFCatalogAdmin role.
  • You have been assigned the DFFlowAdmin role for the environment to which you want to deploy the flow definition.

  • You have been assigned DFProjectMember role for the Project where you want to deploy the flow definition.
  • If you are deploying custom processors or controller services, you may need to meet additional prerequisites.

Select the flow definition version you want to deploy from the catalog

The Catalog is where you manage the flow definition lifecycle, from initial import, to versioning, to deploying a flow definition.

  1. In Cloudera DataFlow, select Catalog from the left navigation pane.
    Flow definitions available for you to deploy are displayed, one definition per row.
  2. Select a row to display the flow definition details and available versions.
    The flow details pane opens on the right.

Launch the deployment wizard

After selecting a flow definition version from the catalog, you need to select an environment, provide a deployment name and assign it to a project using the deployment wizard.

  1. Click Deploy to launch the Deployment wizard.
  2. Select the environment where you want to deploy the flow.
  3. Click Deploy.

Name your flow deployment and assign it to a project

After selecting the flow version and an environment, the deployment wizard takes you to the Overview page. Here you need to provide a name for your flow deployment and assign it to a project. At this point you can also import a previously exported deployment configuration, auto-filling configuration values and thus speeding up deployment.

  1. Give your flow a unique Deployment Name.
    You can use this name to distinguish between different versions of a flow definition, flow definitions deployed to different environments, and similar.
  2. Select a Target Project for your flow deployment from the list of Projects available to you.
    • If you do not want to assign the deployment to any of the available Projects, select Unassigned. Unassigned deployments are accessible to every user with DFFlowUser role in the environment.
    • This field is automatically populated if you import a configuration and the Project referenced there exists in your environment, and you have access to it.
  3. If you have previously exported a deployment configuration that closely aligns with the one you are about to deploy, you can import it under Import Configuration to auto-fill as much of the wizard as possible.
    You can later manually modify auto-filled configuration values during deployment.
  4. Click Next.

Configure NiFi

After selecting the target environment, project, and naming your flow, you need to set Apache NiFi version, possible inbound connections, and custom processors. Depending on the flow definition, you may also need to provide values for a number of configuration parameters. Finally, you need to set the capacity of the NiFi cluster servicing your deployment.

  1. Pick a NiFi Runtime Version for your flow deployment.
    Select if you want to use Apache NiFi 1.x or 2.x with your deployment.
    Cloudera recommends that you always use the latest available version within the 1.x and 2.x lines, if possible.
  2. Specify whether you want the flow deployment to auto-start once deployed.
  3. Specify whether you want to use Inbound Connections that allow your flow deployment receiving data from an external data source.

    If yes, specify the endpoint host name and listening port(s) where your flow deployment listens to incoming data.

    See Creating an inbound connection endpoint for complete information on endpoint configuration options.

  4. Specify whether you want to use NiFi Archives (NARs) to deploy custom NiFi processors or controller services.

    If yes, specify the CDP Workload Username, password, and cloud storage location you used when preparing to deploy custom processors.

    Make sure that you click the Apply button specific to Custom NAR Configuration before proceeding.

  5. If you selected to run your flow with NiFi 2.x [Technical Preview], specify whether you want to use custom Python processors with your flow deployment.
    If yes, specify the CDP Workload Username, password, and cloud storage location where the processors are stored.

    Make sure that you click the Apply button specific to Custom Python Processors before proceeding.

  6. Click Next.

Provide parameter values

Depending on the flow you deploy, you may need to specify parameter values like connection strings, usernames and similar, and upload files like truststores, JARs, and similar.

  1. Provide values to parameters required for your flow deployment.
    You have to provide values for all parameters. You can filter for the still empty fields by selecting the No value checkbox.
  2. When you finished setting configuration parameters, click Next.

Configure sizing and scaling

Set the size and number of Apache NiFi nodes, auto-scaling, and the type of storage to be used.

  1. Specify NiFi node size.
    Select one of the following options:
    • Extra Small: 2 vCores per Node, 4 GB per Node
    • Small: 3 vCores per Node, 6 GB per Node
    • Medium: 6 vCores per Node, 12 GB per Node
    • Large: 12 vCores per Node, 24 GB per Node
  2. Set the number of NiFi nodes and auto-scaling.
    • You can set whether you want to automatically scale your cluster according to flow deployment capacity requirements. When you enable auto-scaling, the minimum number of NiFi nodes are used for initial size and the workload scales up or down depending on resource demands.
    • You can set the number of nodes between 1 and 32.
    • You can set whether you want to enable Flow Metrics Scaling.
  3. Select storage type.
    Select whether you want your deployment to use storage optimized for cost or for performance.
    • Standard: 512 GB Content Repo Size, 512 GB Provenance Repo Size, 256 GB Flow File Repo Size, 2300 IOPS, 150 MB/s Max Throughput
    • Performance: 1024 GB Content Repo Size, 1024 GB Provenance Repo Size, 256 GB Flow File Repo Size, 5000 IOPS, 200 MB/s Max Throughput
  4. Click Next.

Set Key performance indicators

Optionally add key performance indicators to help you track the performance of your flow deployment then review your settings and launch the deployment process.

  1. From KPIs, you may choose to identify key performance indicators (KPIs), the metrics to track those KPIs, and when and how to receive alerts about the KPI metrics tracking.

    See Working with KPIs for complete information about the KPIs available to you and how to monitor them.

  2. Click Next.

Verify your settings and initiate deployment

Review deployment settings, make any necessary changes, and start deployment.

  1. Review a summary of the information provided and make any necessary edits by clicking Previous.
  2. When you are finished, complete your flow deployment by clicking Deploy.

After you click Deploy, you are redirected to the Alerts tab in the Flow Details where you can track how the deployment progresses.