Known issues and limitations
This provides a summary of known issues for Cloudera Data Flow in Data Hub.
- ReplaceText with multiple concurrency tasks can result in data corruption
- ReplaceText, when scheduled to run with multiple Concurrent Tasks, and
using a Replacement Strategy of "Regular Expression" or "Literal Replace" can result in content
being corrupted.
The issue is far more likely to occur with multiple Concurrent Tasks, but it may be possible to trigger when using a single Concurrent Task.
- Schema Registry is unavailable if Knox has failed
-
Schema Registry depends on Knox, but Knox is not highly available. If Knox fails, your clients cannot reach Schema Registry.
- Atlas does not connect lineage when NiFi is writing files to S3 or ADLS Gen2
-
When running a flow that writes data to S3 using PutHDFS or PutS3 processors or when writing files to ADLS Gen2 using PutHDFS or PutADLS, the Atlas type reported by NiFi is "nifi_dataset" while Atlas is expecting it to be of type "aws_s3_pseudo_dir" or "adls_gen2_directory". As a result, while we can show lineage of NiFi flows, the lineage will not be connected to subsequent processes that use these S3 or ADLS files.
- PutAzureDataLakeStorage has several limitations
-
- You can add files to read-only buckets
- There is no check for file overwriting. It is possible to overwrite data.
- To add files to a bucket root level, set the destination with an empty string, rather than " / ".
PutAzureDataLakeStorage was introduced in CFM 2.0.0, for inclusion in Flow Management clusters in CDP Public Cloud. It is not available in HDF 3.5.x or CFM 1.1.x
- Adjust PublishKafkaProcessor default timeout value for cloud
-
The 5000 ms timeout for PublishKafkaRecord processor when "Delivery Guarantee" is set to "all" might not be enough depending on your network setup and workload on the Kafka cluster you are connecting to.
The error message may look similar to:
2020-01-22 09:50:12,854 ERROR org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_2_0: PublishKafkaRecord_2_0[id=cca729a5-016f-1000-ffff-ffffa3429f0c] Failed to send StandardFlowFileRecord[uuid=03d8a3ab-e1a3-41a4-9fa1-07af176aeb56, claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1579683456791-21, container=default, section=21], offset=844488, length=20],offset=0, name=NLgsvfQuAi2aad67b4-b053-4e42-b9e4-a10153941229,size=20] to Kafka: org.apache.kafka.common.errors. TimeoutException: Failed to update metadata after 5000 ms.
- Terminating a Flow Management cluster does not delete the cluster specific NiFi and NiFi Registry repositories in Ranger
-
When a new Flow Management cluster is created, the setup process creates new repository entries in Ranger to allow cluster specific Ranger policies. These cluster specific Ranger repositories are not being deleted when a cluster is terminated. This can lead to issues when a cluster is terminated and another cluster with the same name is being created afterwards. The new cluster will re-use the existing Ranger repository but now the NiFi component UUIDs in existing policies do not match the UUIDs of the new cluster.
- Terminating a Streams Messaging cluster does not delete the cluster specific Kafka repositories in Ranger
-
When a new Streams Messaging cluster is created, the setup process creates new repository entries in Ranger to allow cluster specific Ranger policies. These cluster specific Ranger repositories are not being deleted when a cluster is terminated.
- Scaling Kafka Brokers or NiFi Nodes up/down is not possible
- Data Hub does not allow users to resize Kafka broker or NiFi node groups
Technical Service Bulletins
- TSB 2022-580: NiFi Processors cannot write to content repository
- If the content repository disk is filled more than 50% (or any other value that is set in
nifi.properties
fornifi.content.repository.archive.max.usage.percentage
), and if there is no data in the content repository archive, the following warning message can be found in the logs: "Unable to write flowfile content to content repository container default due to archive file size constraints; waiting for archive cleanup". This would block the processors and no more data is processed.This appears to only happen if there is already data in the content repository on startup that needs to be archived, or if the following message is logged: “Found unknown file XYZ in the File System Repository; archiving file”.
- Upstream JIRA
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2022-580: NiFi Processors cannot write to content repository