Enable Cloudera Data Flow for an environment

Before you can deploy flow definitions, you must enable Cloudera Data Flow for a Cloudera on cloud environment. Enabling Cloudera Data Flow for an environment means that you are preparing an active and healthy Cloudera on cloud environment for use with Cloudera Data Flow.

  • You must have a cloud provider account and meet the infrastructure and network requirements.
  • You must have a healthy Cloudera on cloud environment, with FreeIPA and the data lake running and healthy.
  • You must have the DFAdmin role for the Cloudera on cloud environment for which you want to enable Cloudera Data Flow.
  1. Navigate to Cloudera Data Flow, by selecting Data Flow from the Cloudera on cloud Home Page, or from the navigation pane.
  2. Go to Environments, and click Enable to launch the Enable Cloudera Data Flow Service pane for the environment you want to enable.
  3. In the Enable Cloudera Data Flow Service form, provide the following information:
    • Instance Type – From the available instance types, select the one appropriate for your use case. For testing and evaluation, the smallest c5.xlarge instance type with 4 vCPUs, 8 GB RAM is sufficient.
    • Cloudera Data Flow Capacity – Specify a minimum and a maximum size for the Kubernetes cluster. You can keep the default settings.
    • Networking
  4. Click the Enable button. Enabling Cloudera Data Flow can take up to one hour.

When you have finished enabling Cloudera Data Flow for an environment, proceed by giving users permission to import and deploy flow definitions.