Known issues and limitations
This provides a summary of known issues for Cloudera Data Flow in Data Hub.
- Atlas does not connect lineage when NiFi is writing files to S3 or ADLS Gen2
-
When running a flow that writes data to S3 using PutHDFS or PutS3 processors or when writing files to ADLS Gen2 using PutHDFS or PutADLS, the Atlas type reported by NiFi is "nifi_dataset" while Atlas is expecting it to be of type "aws_s3_pseudo_dir" or "adls_gen2_directory". As a result, while we can show lineage of NiFi flows, the lineage will not be connected to subsequent processes that use these S3 or ADLS files.
- PutAzureDataLakeStorage has several limitations
-
- You can add files to read-only buckets
- There is no check for file overwriting. It is possible to overwrite data.
- To add files to a bucket root level, set the destination with an empty string, rather than " / ".
PutAzureDataLakeStorage was introduced in CFM 2.0.0, for inclusion in Flow Management clusters in CDP Public Cloud. It is not available in HDF 3.5.x or CFM 1.1.x
- Adjust PublishKafkaProcessor default timeout value for cloud
-
The 5000 ms timeout for PublishKafkaRecord processor when "Delivery Guarantee" is set to "all" might not be enough depending on your network setup and workload on the Kafka cluster you are connecting to.
The error message may look similar to:
2020-01-22 09:50:12,854 ERROR org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_2_0: PublishKafkaRecord_2_0[id=cca729a5-016f-1000-ffff-ffffa3429f0c] Failed to send StandardFlowFileRecord[uuid=03d8a3ab-e1a3-41a4-9fa1-07af176aeb56, claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1579683456791-21, container=default, section=21], offset=844488, length=20],offset=0, name=NLgsvfQuAi2aad67b4-b053-4e42-b9e4-a10153941229,size=20] to Kafka: org.apache.kafka.common.errors. TimeoutException: Failed to update metadata after 5000 ms.
- Terminating a Flow Management cluster does not delete the cluster specific NiFi and NiFi Registry repositories in Ranger
-
When a new Flow Management cluster is created, the setup process creates new repository entries in Ranger to allow cluster specific Ranger policies. These cluster specific Ranger repositories are not being deleted when a cluster is terminated. This can lead to issues when a cluster is terminated and another cluster with the same name is being created afterwards. The new cluster will re-use the existing Ranger repository but now the NiFi component UUIDs in existing policies do not match the UUIDs of the new cluster.
- ReportLineageToAtlas reporting task is throwing errors on a new Flow Management cluster
-
Ranger policies that allow NiFi to publish metadata to Atlas are not created automatically which prevents NiFi from writing to Atlas.
Attempting to write to Atlas may result in an error message similar to:
"ReportLineageToAtlas[id=843ce571-0171-1000-ffff-ffffefdc49dd] Error running task ReportLineageToAtlas[id=843ce571-0171-1000-ffff-ffffefdc49dd] due to java.lang.RuntimeException: Failed to check and create NiFi flow type definitions in Atlas due to org.apache.atlas.AtlasServiceException: Metadata service API org.apache.atlas.AtlasClientV2$API_V2@3ea5b832 failed with status 403 (Forbidden) Response Body ({"errorCode":"ATLAS-403-00-001","errorMessage":"nifi is not authorized to perform create entity-def nifi_output_port"})"
- The FQDNs of the NiFi nodes in a Flow Management cluster are not registered with public DNS
-
Some NiFi use cases require inbound connectivity to NiFi from external systems using hostnames. Currently the FQDNs of NiFi nodes cannot be resolved over the public internet
- Terminating a Streams Messaging cluster does not delete the cluster specific Kafka repositories in Ranger
-
When a new Streams Messaging cluster is created, the setup process creates new repository entries in Ranger to allow cluster specific Ranger policies. These cluster specific Ranger repositories are not being deleted when a cluster is terminated.
- Scaling Kafka Brokers or NiFi Nodes up/down is not possible
- Data Hub does not allow users to resize Kafka broker or NiFi node groups
- NiFi Registry API endpoint is not displayed in the list of exposed cluster endpoints in Data Hub UI
-
For all services running within a Flow Management cluster, Knox is set up to proxy requests to the endpoints. While the NiFi Registry API is configured to be proxied by Knox, the endpoint URI is not exposed in the Data Hub cluster management UI.