Known issues and limitations

You must be aware of the known issues and limitations, the areas of impact, and workarounds in Cloudera Data Flow.

Known issues

CDPDFX-11302: Kafka to Snowflake and Confluent Cloud to Snowflake ReadyFlows need to be adjusted for Snowflake's requirement for multi-factor authentication

Snowflake now requires multi-factor authentication, causing the Snowflake ReadyFlows to fail for components that only provide username and password.

For the Snowflake ReadyFlows, in the SnowflakeComputingConnection Pool controller service, configure the Private Key Service property to use the existing StandardPrivateKeyService controller service.

CDPDFX-11288: HuggingFace to S3/ADLS ReadyFlow references outdated dataset

The default value of the Dataset Name configuration parameter (wikitext) is outdated. The correct value is Salesforce/wikitext.

Replace wikitext with Salesforce/wikitext during flow deployment.

CDPDFX-11266: Selecting View In NiFi in Deployment Manager navigates to the root process group, not to the flow process group

If you select a flow in the Deployment & Flows view then click Actions > View in NiFi, the Apache NiFi canvas is supposed to show the flow process group (PG), but it shows the root PG instead.

On the NiFi canvas, select and open the required flow PG manually.

ENGESC-28680: Flow Missing on Cloudera Data Flow Canvas

In rare cases, a single-node deployment may disappear from Cloudera Data Flow after a deployment restart. The root cause is related to leader election timeout within NiFi cluster deployments when scaling up from a single node: the cluster restarts with a single node and quickly scales to two nodes. The race condition between leader election and the single node restarting leads to an unhealthy deployment state and causes the NiFi deployment to disappear from Cloudera Data Flow. There is currently no fix that allows single-node deployments to avoid this race condition.

One way to prevent this situation from occurring is to configure the NiFi cluster deployment with a minimum node count of two nodes instead of one. This setup improves initial cluster stability and reduces the chance of electing a leader node without any flow, preventing the loss of flow definitions.

An alternative workaround for regularly inactive jobs is suspending a deployment when flows are not running, instead of stopping it completely. This keeps the deployment definition intact while reducing operational overhead.

CDPDFX-7121: Environment unavailable for new deployments while diagnostics collection is running

During deployment creation you cannot select Cloudera Data Flow environments where diagnostic bundle collection is ongoing.

Wait for diagnostic bundle collection to finish before proceeding with flow deployment.

CDPDFX-10009: Pinecone processors require value for obsolete environment parameter
Pinecone no longer requires the environment to connect, but you still need to provide a value for the Pinecone Environment parameter for deployments to succeed as the related property in Pinecone processors is required.
Provide a random string as the parameter value.
Cloudera Data Flow Functions: DEFAULT_PARAM_CONTEXT variable no longer works alone

This issue only occurs in AWS environments and affects configurations where there are no PARAM_CONTEXT_ variables defined, only a DEFAULT_PARAM_CONTEXT.

The DEFAULT_PARAM_CONTEXT configuration variable instructs Cloudera Data Flow Functions which default secret to use when there is no secret that matches the parameter contexts in the flow. This variable is now ignored.

Create an environment variable in Configuration called PARAM_CONTEXT_[***NAME***] where [***NAME***] is the user-defined name of the parameter context. Specify the name of the AWS Secret you want to use as the value of this variable.

IAM Policy Simulator preflight check fails with resource policy validation

With all cross account policies in place, IAM Policy Simulator preflight check still fails with the following error message:

IAM Resource Policy validation failed on AWS. CrossAccount role does not have permissions for these operations : : ssm:GetParameter, ssm:GetParameters, ssm:GetParameterHistory, ssm:GetParametersByPath

This happens because even if a given cross account role is allowed to perform a certain action (granted through IAM Policies), an attached Service Control Policy (SCP) may override that capability if it enforces a Deny on that action. SCP takes precedence over IAM Policies. SCPs are either applied at the root of an organization, or can be applied to individual accounts. A permission can be blocked at any level above the account, either implicitly or explicitly (by including it in a Deny policy statement).

As the IAM Simulator SDK does not have an option to include or exclude an organization’s SCP policy, the preflight check will fail if an SCP policy is denying an action, even though the IAM role has the necessary permissions.

This is a known issue in AWS.

Do not select the Skip Validations option when enabling Cloudera Data Flow to bypass this issue. This bypasses all preflight validation checks. Instead, submit a request to add the LIFTIE_DISABLE_IAM_PREFLIGHT_CHECK entitlement to your account which ensures only the IAM Policy preflight validation check is skipped.

Limitations

NiFi 2.x runtime downgrades
Apache NiFi 2 introduced breaking changes and backwards compatibility cannot be guaranteed, therefore NiFi 2 downgrades are not permitted. This does not include hotfix version downgrades within the same NiFi runtime version (for example, 2.6.0.4.3.4.0-166 to 2.6.0.4.3.4.0-123).
NiFi 2 upgrades

Apache NiFi version 2.3 became generally available (GA) in Cloudera Data Flow release 2.10.0. Previous NiFi 2 versions in Cloudera Data Flow 2.9.0 or lower were provided as Technical Preview and therefore are not supported for upgrade to 2.10.0 or higher. Before performing a service upgrade from release 2.9.0 or lower to release 2.10.0 or higher, you must remove all NiFi 2 flow deployments.

Reassigning resources
  • Parameter groups that have referencing flow drafts cannot be reassigned to another project.
  • Flow drafts referencing a parameter group cannot be reassigned to another project.
  • Project reassignment does not move assets. When reassigning a parameter group that includes a FILE type parameter (asset) reference to another project, that asset needs to be re-uploaded to the new project.
  • Flow deployments referencing a parameter group cannot be reassigned to another project.

Parameter group duplication

Duplication of parameter groups referencing assets does not duplicate referenced assets.

After duplicating a parameter group, assets have to be re-uploaded manually. This means that if you reassign a parameter group to another project, duplicate a parameter group, or export a parameter group, the asset will not be moved or duplicated.

Duplication can only happen inside a given project. If the newly created group is to be assigned to another project, that step can happen only after the duplication concludes. (Another project cannot be targeted for duplication.)

Diagnostic Bundle collection through the Management Console is available on the US Control Plane only
There is no workaround for this issue.
Data Lineage information is not automatically reported to Atlas in the Data Catalog
Flow deployments created by Cloudera Data Flow do not come with a pre-configured ReportLineageToAtlas Reporting Task.
If you have been assigned the DFFlowAdmin role, you can manually create and configure the ReportLineageToAtlas Reporting Task in the NiFi canvas after a deployment is completed.
PowerUsers are not able to create flow deployments without additional Cloudera Data Flow roles
While the PowerUser role gives you the ability to view flow deployments in the Dashboard, view flow definitions in the Catalog, and allows the user to initiate flow deployments, the Deployment Wizard fails after selecting an environment for which the user does not have the DFFlowAdmin resource role assigned.
Assign the DFFlowAdmin role to the user of the environment to which they want to deploy flow definitions.
Cloudera Data Flow reports "Bad Health" during Data Lake upgrade
Cloudera Data Flow monitors the state of the associated Cloudera on cloud environment to decide which actions Cloudera Data Flow users can take. Cloudera Data Flow detects Data Lake upgrades of the associated Cloudera on cloud environment and puts the Cloudera Data Flow service into Bad Health for the duration of the upgrade blocking new deployments.
To work around this issue, wait for the Data Lake upgrade to complete before creating new flow deployments.
Deployments and Cloudera Data Flow Services are no longer visible in the Cloudera Data Flow Dashboard or Environments page when the associated Cloudera on cloud Environment has been deleted
If the associated Cloudera on cloud Environment is deleted while a Cloudera Data Flow Service is enabled, it will become orphaned. Orphaned resources are no longer visible to users without the PowerUser role.
To work around this issue, open the Environments or Dashboard page with a user who has been assigned the PowerUser role. PowerUsers are able to view orphaned deployments and Cloudera Data Flow services.
Non-transparent proxies are not supported on Azure
There is no workaround for this issue.