Enabling DataFlow for an environment

Enabling DataFlow for an environment is your first step in getting started with Cloudera DataFlow. To do this, ensure that you have met the prerequisites, and then launch the Enable Environment window to walk you through the process.

  1. From the CDP Public Cloud home page, click Cloudera DataFlow, then click Environments.
  2. Find the environment you want to enable, and click Enable to launch the Enable Environment window.

    If the Enable button is greyed out, hover over the Not Enabled icon for more details about the problem.

  3. Configure DataFlow Capacity.
    This defines the Kubernetes cluster minimum and maximum size and specifies the size of the Kubernetes cluster. Your DataFlow cluster automatically scales between the minimum and maximum cluster size that you specify here.
  4. Configure Networking.
    1. Specify DataFlow Subnet Use.

      If you do not make a subnet selection, DataFlow considers all available subnets that have been registered with the environment for worker placement. Worker nodes are only placed in public subnets if no private subnets are available.

    2. Specify whether to use a Public Endpoint.

      Select this option when you want to allow users to connect to workload side UIs like the DataFlow Deployment Manager or the actual NiFi UI through the public Internet.

      • If checked, this option provisions an endpoint (load balancer) in a public subnet.

      • If unchecked, CDP creates an endpoint in a private subnet and you must setup access to the endpoint manually in your cloud account to allow user access to workload side UIs.

    3. Specify Load Balancer Subnet Use.
      • Select Available Subnets – Explicitly specifying subnets overrides the automatic, tag-based subnet selection process and ensures load balancer provisioning in the specified subnets. If no subnets are specified, DataFlow provisions a load balancer according to how the subnets have been tagged. For more information, see VPC and subnets in the Related information section below
      • Specify Load Balancer Endpoint Access – Specify a set of IP address ranges that will be allowed to access the DataFlow load balancer endpoint. Providing no IP address ranges makes the load balancer endpoint open to all traffic.
  5. Configure the Kubernetes API Server Endpoint Access.

    Specify a set of IP address ranges that will be allowed to access the Kubernetes API Server Endpoint. Providing no IP address ranges makes the Kubernetes API Server Endpoint open to all traffic.

    In either case, any user who needs access to the Kubernetes API Server must be granted remote access to the underlying Kubernetes cluster. This can be configured after DataFlow has been enabled successfully.

  6. Configure Tags.

    Tags are added to DataFlow resources at the time of enablement. These tags are in addition to those set by the CDP service.

  7. Click Enable. This may take up to 45 minutes.

Your cluster status changes from Not Enabled to Enabling.

  • Hover over Enabling for environment enablement event messages to display.

  • Click the Alerts tab to see environment enablement event messages.

  • Click anywhere in your environment row to see your environment details.

Once you have enabled your DataFlow environment, you are ready to deploy your first flow definition from the catalog