Known issues and limitations
You must be aware of the known issues and limitations, the areas of impact, and workarounds in Cloudera DataFlow.
Known issues
- Enable DataFlow dialog offers incorrect options for Load Balancer subnet selection if Endpoint Gateway Subnets are defined in the CDP Environment
-
CDP supports optionally specifying Endpoint Gateway Subnets at the CDP Environment level. The known issue explained below is only present if this Endpoint Gateway Subnet feature is used.
When enabling DataFlow, the Endpoint Gateway Subnets values from the CDP Environment are referenced to influence the options offered. The intended behavior is to use the Endpoint Gateway Subnets from the CDP Environment as the options for Load Balancer Subnet selection when enabling DataFlow. The actual behavior is that Endpoint Gateway Subnets are ignored and the standard Network Subnets at CDP Environment level are used instead.
In cases where specific Endpoint Gateway Subnets are specified at the CDP Environment level, enabling DataFlow will use the Network Subnets for Load Balancer placement and may fail or attach the Load Balancer to the wrong subnet.
- IAM Policy Simulator preflight check fails with resource policy validation
With all cross account policies in place, IAM Policy Simulator preflight check still fails with the following error message:
IAM Resource Policy validation failed on AWS. CrossAccount role does not have permissions for these operations : : ssm:GetParameter, ssm:GetParameters, ssm:GetParameterHistory, ssm:GetParametersByPath
This happens because even if a given cross account role is allowed to perform a certain action (granted through IAM Policies), an attached Service Control Policy (SCP) may override that capability if it enforces a
Deny
on that action. SCP takes precedence over IAM Policies. SCPs are either applied at the root of an organization, or can be applied to individual accounts. A permission can be blocked at any level above the account, either implicitly or explicitly (by including it in aDeny
policy statement).As the IAM Simulator SDK does not have an option to include or exclude an organization’s SCP policy, the preflight check will fail if an SCP policy is denying an action, even though the IAM role has the necessary permissions.
This is a known issue in AWS.
- Stale Prometheus rules cause false or missing capacity alerts
-
If you performed a DataFlow settings update and modified the maximum Kubernetes node capacity value between October 10, 2022 (release 2.2.0-h3-b2) and December 08, 2022 (release 2.3.0-b347), the Prometheus alert rules that govern raising active alerts when the cluster is nearing and has reached maximum capacity do not reflect the updated minimum and maximum thresholds. This means you may observe an active alert indicating the DataFlow cluster is nearing capacity or has reached capacity when in fact it has not. Alternatively, you may not encounter an active alert when one should be active. This is due to rules being stale and reflecting the previous maximum capacity value.
- Role assignment processing
- Role assignment changes for DataFlow workload applications are only retrieved when the user logs into the workload application. Note, that the user can remain logged in for up to eight hours after their last interaction with the workload application.
Limitations
- Reporting Tasks are not supported as part of a flow deployment
- When you download a flow definition from NiFi, existing reporting tasks are not exported to the resulting JSON file. As a result, deployments created in DataFlow do not have any pre-configured Reporting Tasks.
- Data Lineage information is not automatically reported to Atlas in the Data Catalog
- Flow deployments created by DataFlow do not come with a pre-configured ReportLineageToAtlas Reporting Task.
- PowerUsers are not able to create flow deployments without additional DataFlow roles
- While the PowerUser role gives you the ability to view flow deployments in the Dashboard, view flow definitions in the Catalog, and allows you to initiate flow deployments, the Deployment Wizard fails after selecting an environment for which the user does not have the DFFlowAdmin resource role assigned.
- CDF reports "Bad Health" during Data Lake upgrade
- CDF monitors the state of the associated CDP environment to decide which actions CDF users can take. CDF detects Data Lake upgrades of the associated CDP environment and puts the CDF service into Bad Health for the duration of the upgrade blocking new deployments.
- Deployments and CDF Services are no longer visible in the DataFlow Dashboard or Environments page when the associated CDP Environment has been deleted
- If the associated CDP Environment is deleted while a DataFlow Services is enabled, it will become orphaned. Orphaned resources are no longer visible to users without the PowerUser role.
- Non-transparent proxies are not supported on Azure
- There is no workaround for this issue.
- Disk related metrics have been removed from deployments and DataFlow service details
-
In previous versions of CDF, deployments and enabled DataFlow services showed disk capacity and disk usage metrics as part of their system metrics. You were also able to define KPIs and alerts on these metrics. Due to issues with the underlying metrics collection framework, the following metrics have been removed starting with DFX-1.1.0-h2-b1:
-
Disk Capacity (DF Service Metric)
-
Disk Capacity (Deployment System Metric)
-
Disk Usage (Deployment System Metric)
-