Enabling Cloudera DataFlow for an environment
Enabling Cloudera DataFlow for an environment is your first step in getting started with Cloudera DataFlow. To do this, ensure that you have met the prerequisites, and then launch the Enable Environment window to walk you through the process.
-
You have an AWS or an Azure account.
-
You have prepared your infrastructure and network. For more information on requirements, see the AWS Resource Planning or the Azure Resource Planning documentation respectively.
-
You have created and registered a Cloudera Public Cloud environment. For more information, see the Data Hub documentation.
-
FreeIPA is running and healthy.
-
Your Data Lake is started and healthy.
- You have the DFAdmin role for the environment you want to enable. For more information, see the Cloudera DataFlow Security documentation
Steps
- From the Cloudera Public Cloud home page, click Cloudera DataFlow, then click Environments.
- Find the environment you want to enable, and click
Enable to launch the Environments
/ Enable DataFlow window.
If the Enable button is grayed out, hover over the Not Enabled icon for more details about the problem.
- Configure DataFlow Capacity.
This defines the minimum and maximum size of the Kubernetes (K8s) cluster. Your Cloudera DataFlow cluster automatically scales between the minimum and maximum cluster size that you specify here.
- Configure Networking.
- Specify whether to use a Public
Endpoint.
Select this option when you want to allow users to connect to workload side UIs like the Cloudera DataFlow Deployment Manager or the actual NiFi UI through the public Internet.
-
If checked when enabling an AWS environment, this option provisions an endpoint (load balancer) in a public subnet.
If checked when enabling an Azure environment, this option provisions an endpoint (load balancer) in a public subnet and you cannot explicitly specify subnets. The Load Balancer Subnet Use option is not available.
-
If unchecked, Cloudera Public Cloud creates an endpoint in a private subnet, and you must set up access to the endpoint manually in your cloud account to allow user access to workload side UIs.
-
- Specify Load Balancer
Subnets.
Select from Available Subnets. Explicitly specifying subnets overrides the automatic, tag-based subnet selection process and ensures load balancer provisioning in the specified subnets. If no subnets are specified, Cloudera DataFlow provisions a load balancer according to how the subnets have been tagged. For more information, see VPC and subnets in the Related information section below.
- Specify Worker Node Subnets.
Select one of the Available Subnets. If you do not make a subnet selection, Cloudera DataFlow considers any available subnet that has been registered with the environment for worker placement. Worker nodes are only placed in public subnets if no private subnets are available.
- Specify Load Balancer Endpoint
Access.
Specify a set of IP address ranges that will be allowed to access the Cloudera DataFlow load balancer endpoint. Providing no IP address ranges makes the load balancer endpoint open to all traffic.
- Specify whether to use a Public
Endpoint.
- Configure Kubernetes.
- Configure API Server Endpoint
Access.
Cloudera Public Cloud environments with Cluster Connectivity Manager enabled support the ability to create a fully private cluster, which disallows all access to the Kubernetes API Server Endpoint.
- Specify whether to use a Private
Cluster.
If you select Private Cluster IP based access is not applicable, and the option to specify a set of IP address ranges is not available.
If you do not select Private Cluster you can specify a set of IP address ranges that are allowed to access the Kubernetes API Server Endpoint. Providing no Classless Inter-Domain Routing (CIDR) makes the Kubernetes API Server Endpoint open to all traffic.
In either case, any user who needs access to the Kubernetes API Server must be granted remote access to the underlying Kubernetes cluster. This can be configured after Cloudera DataFlow has been enabled successfully.
- Specify whether to use a Private
Cluster.
- Configure Pod CIDR Range.
The CIDR notation IP range from which to assign IPs to pods in your kubernetes cluster. This address should be a large address space that is not in use elsewhere in your network environment, accommodating the number of nodes that you expect to scale up to. The default value is
10.244.0.0/16
. The value is used to assign a/24
address space to each node in the cluster. For example, the first node would be assigned10.244.0.0/24
, the second node10.244.1.0/24,
the third node10.244.2.0/24
. As the cluster scales or upgrades, the platform continues to assign a pod IP address range to each new node. - Configure Service CIDR Range.
The CIDR notation IP range from which to assign IPs to internal services in your kubernetes cluster. This IP address range should be an address space that is not in use elsewhere in your network environment, accommodating the amount of kubernetes services that you intend to use. The default value is
10.0.0.0/16
.
- Configure API Server Endpoint
Access.
- Configure Tags as keys and their values.
Tags are added to Cloudera DataFlow resources at the time of enablement. These tags are included in addition to those set by the Cloudera Public Cloud service.
- Configure Notifications.
Select the types of events for which you want to receive notifications on the Cloudera Management Console.
- DataFlow Service
-
If you enable this option, you will receive notifications about enablement progress, scaling operations, and health for this Cloudera DataFlow service. Select the event severity levels you want to receive notifications for (Info, Warning, Error). Cloudera recommends you keep notifications enabled for all event severities.
- All Deployments
- If you enable this option, you will receive notifications for deployment progress, scaling operations, and health for all deployments in this workspace. Select the event severity levels you want to receive notifications for (Info, Warning, Error). Cloudera recommends you keep notifications enabled for all event severities.
You can configure email notifications for the events you have subscribed to after the environment is enabled for Cloudera DataFlow.
- Click Enable. This may take up to 45 minutes.
Result
Your cluster status changes from Not Enabled to Enabling.
-
Hover over Enabling for environment enablement event messages to display.
-
Click the Alerts tab to see environment enablement event messages.
-
Click anywhere in your environment row to see your environment details.
Before you begin
- You have installed the CDP CLI.
- You have run
cdp environments list-environments #
to obtain the environment-crn.
Steps
- To enable Cloudera DataFlow for an environment,
enter:
cdp df enable-service --environment-crn [***ENVIRONMENT_CRN***] --min-k8s-node-count [***MIN_K8S_NODE_COUNT***] --max-k8s-node-count [***MAX_K8S_NODE_COUNT***] [--use-public-load-balancer] [--no-use-public-load-balancer] [--private-cluster] [--no-private-cluster] [--kube-api-authorized-ip-ranges [[***KUBE_API_AUTHORIZED_IP_RANGES***[ [[***KUBE_API_AUTHORIZED_IP_RANGES***[ ...]]] [--tags [***TAGS***]] [--load-balancer-authorized-ip-ranges [[***LOAD_BALANCER_AUTHORIZED_IP_RANGES***] [[***LOAD_BALANCER_AUTHORIZED_IP_RANGES***] ...]]] [--cluster-subnets [[***CLUSTER_SUBNETS***] [[***CLUSTER_SUBNETS***] ...]]] [--load-balancer-subnets [[***LOAD_BALANCER_SUBNETS***] [[***LOAD_BALANCER_SUBNETS***] ...]]] [--send-events-to-notification] [--no-send-events-to-notification] [ --notification-severity-levels [[***NOTIFICATION_SEVERITY_LEVELS***] [[***NOTIFICATION_SEVERITY_LEVELS***] ...]]] [--send-collective-flows-events-to-notification][--no-send-collective-flows-events-to-notification] [--collective-flows-notification-severity-levels [[***COLLECTIVE_FLOWS_NOTIFICATION_SEVERITY_LEVELS***] [[***COLLECTIVE_FLOWS_NOTIFICATION_SEVERITY_LEVELS***] ...]]]
Where:
- --environment-crn specifies the environment-crn you obtained while completing the pre-requisites.
- --min-k8s-node-count and --max-k8s-node-count specify your Cloudera DataFlow capacity. Your Cloudera DataFlow cluster automatically scales between the minimum and maximum cluster size that you specify here.
- [--use-public-load-balancer] [--no-use-public-load-balancer]
- [--private-cluster] [--no-private-cluster]
- --kube-api-authorized-ip-ranges specifies a set of IP address ranges that will be allowed to access the Kubernetes API Server Endpoint. Providing no IP address ranges makes the Kubernetes API Server Endpoint open to all traffic.
- --tags specifies any tags you want to add to Cloudera DataFlow resources at the time of enablement. These tags are in addition to those set by the Cloudera service.
- --load-balancer-authorized-ip-ranges
- --cluster-subnets
- --load-balancer-subnets
- [--send-events-to-notification], [--no-send-events-to-notification] define whether you want to receive notifications on Cloudera Management Console about environment level events and --notification-severity-levels defines the types of events you want to receive.
- [--send-collective-flows-events-to-notification], [--no-send-collective-flows-events-to-notification] define whether you want to receive event notifications about all deployments in the environment and --collective-flows-notification-severity-levels defines the types of events you want to receive.
Example
Successfully enabling Cloudera DataFlow for an AWS environment results in output similar to:
{
"service": {
"kubeApiAuthorizedIpRanges": [],
"validActions": [
"ENABLE"
],
"loadBalancerAuthorizedIpRanges": [],
"clusterSubnets": [],
"loadBalancerSubnets": [],
"tags": {},
"crn": "crn:cdp:df:us-west-1:CLOUDERA:service:96827a6b-bd8b-42fb-85ad-bdb4fc7ca43d",
"environmentCrn": "crn:cdp:environments:us-west-1:CLOUDERA:environment:dev-east-local-k8s",
"name": "dev-east-local-k8s",
"cloudPlatform": "LOCAL_K8S",
"region": "local-k8s",
"deploymentCount": 0,
"minK8sNodeCount": 3,
"maxK8sNodeCount": 4,
"status": {
"state": "ENABLING",
"message": "Enabling DataFlow crn:cdp:environments:us-west-1:CLOUDERA:environment:dev-east-local-k8s",
"detailedState": "NEW"
},
"runningK8sNodeCount": 0,
"instanceType": "m5.xlarge",
"dfLocalUrl": "",
"activeWarningAlertCount": "0",
"activeErrorAlertCount": "0",
"clusterId": "",
"clusterUsable": false,
"usePublicLoadBalancer": false,
"usePrivateCluster": false,
"creatingK8sNodeCount": 0,
"terminatingK8sNodeCount": 0
}
}
After the environment is enabled for Cloudera DataFlow, you can subscribe to receiving email notifications on events you have configured during enablement. For more information, see the Cloudera Management Console documentation.
Once you have enabled your Cloudera DataFlow environment, you are ready to deploy your first flow definition from the Catalog. For instructions, see Deploying a flow definition.
For more information on managing and monitoring Cloudera DataFlow, see Managing Cloudera DataFlow in an environment.