What's new in Flow Management with NiFi 2

Learn about the new Flow Management features using NiFi 2 available in Cloudera Data Hub clusters in Cloudera on cloud 7.3.1.

Cloudera 7.3.1.0 and 7.3.1.400 are platform-level releases defining the base environment used to create and run different Data Hub cluster types including Flow Management. Flow Management Data Hub clusters are compatible with both NiFi 1 and NiFi 2.

7.3.1.400

The following sections provide details about Cloudera Flow Management 4.2.1.400 based on Apache NiFi 2.3.0, with information on the most important new features, improvements, and fixes included in this release. This release includes a range of Cloudera-specific capabilities and improvements, providing an enterprise-ready foundation for building, operating, and scaling your data pipelines. Switching to NiFi 2 in your data flow development and operation is not just a technical update, it is a modernization of the whole data flow experience, offering greater performance, flexibility, and security.

Native Python 3 support
NiFi 2 introduces a powerful Python API, enabling you to develop processors, controller services, and reporting tasks directly in Python. This allows you to embed Python-based orchestration and data manipulation into your flows without relying on external scripting.
AI & GenAI components

NiFi 2 provides a framework for building AI-enabled processors. Cloudera extends this with production-grade GenAI components, supporting advanced AI-driven workflows.

Flow analysis rules

A new, built-in rules engine enables real-time data quality validation within your flows. You can define rules to detect schema violations, missing fields, or implement custom business logic, all without additional coding.

Enhanced authentication

NiFi 2 removes legacy Kerberos configuration methods and enforces the use of Kerberos UserService. Authentication and encryption are now stricter and more secure by default.

Parameterization enhancement

Global variables have been replaced by a granular parameter framework, improving flow modularity and manageability.

Redesigned user interface

The new NiFi 2 UI is faster, cleaner, and more intuitive. It supports modular flow design with reusable components, speeding up development and allows for the centralization of design patterns in large projects.

Migration tool

Cloudera provides a Flow Management Migration Tool that helps you replace deprecated processors, update configurations, and handle breaking changes automatically in your data flows. For more information, see the Cloudera Flow Management Migration Tool documentation.

Other improvements

NiFi 2 brings significant performance enhancements, especially important for high-volume workloads. It expands connectivity options both for source and target systems, simplifying to connect your data pipelines in hybrid environments. With richer native integration features, you can reduce reliance on custom processors. NiFi 2 also improves developer productivity with enhanced SDKs for Python and Java, enabling faster development and more efficient flow management.

New NiFi components

Cloudera Flow Management 4.2.1.400 introduces several new NiFi components to support broader integration and processing capabilities.

New processors:
  • GetBoxFileCollaborators
  • ExecuteSparkInteractive
  • GetBoxGroupMembers
  • ConsumeBoxEnterpriseEvents
  • SawmillTransformRecord
  • PutSolrRecord
  • CaptureChangeDebeziumMongoDB
  • SawmillTransformJSON
  • GetSolr
  • GetS3ObjectTags
  • FetchBoxFileRepresentation
  • PutSolrContentStream
  • ListBoxFileInfo
  • QuerySolr
  • ListenBeats
  • ConsumeBoxEvents
  • FetchBoxFileInfo
Python processors:
  • New
    • PromptClaude
    • TokenCount
    • PromptAzureOpenAI
    • PromptOpenAI
  • Renamed
    • Bedrock renamed to PromptBedrock
New controller services:
  • ClouderaEncodedSchemaReferenceReader
  • ClouderaEncodedSchemaReferenceWriter
  • PhoenixThickConnectionPool
  • PhoenixThinConnectionPool
  • ClouderaAttributeSchemaReferenceWriter
  • DeveloperBoxClientService
  • RESTCatalogService
  • ClouderaAttributeSchemaReferenceReader
  • LivySessionController
  • PEMEncodedSSLContextProvider
  • StandardDatabaseDialectService
New parameter provider:
  • PropertiesFileParameterProvider
Flow Analysis Rules:
  • New
    • RequireMergeBeforePutIceberg
    • RestrictFlowFileExpiration
  • Renamed
    • RestrictBackpressure replaced with RestrictBackpressureSettings

For a comprehensive list of supported NiFi components in Cloudera 7.3.1.400 Flow Management Data Hub clusters, see Supported NiFi extensions.

Removed components

Over 140 components have been removed and must be replaced in NiFi 2. For more information on breaking changes between NiFi 1 and 2, see Behavioral changes in Flow Management.

Readded components

Cloudera Flow Management 4.2.1.400 restores support for several NiFi components that were deprecated in Apache NiFi 2. These components have been re-added to maintain compatibility and support key use cases.

Processors:
  • PutSolrContentStream
  • PutSolrRecord
  • QuerySolr
  • ExecuteSparkInteractive
Controller service:
  • LivySessionController
Upgrade and migration options

There is no supported in-place upgrade path from Flow Management Data Hub clusters powered by NiFi 1 to Flow Management Data Hub clusters powered by NiFi 2.

Cloudera provides a Migration Tool that automates complex, repetitive, and error-prone manual tasks in updating flow configurations, reducing manual effort and ensuring compatibility with NiFi 2 features. This Cloudera Flow Management Migration Tool simplifies the transition by:

  • Replacing removed processors
  • Converting variable-based configurations to parameters
  • Reconfiguring flows to use new controller services
  • Converting older templates into flow definitions
  • Adapting to security, data type, and API changes introduced in NiFi 2

7.3.1.0

The following sections provide details about Cloudera Flow Management 4.2.1.0 based on Apache NiFi 2.0.0, with information on the most important new features, improvements, and fixes included in this release. NiFi 2 introduces numerous changes compared to NiFi 1, including several breaking changes. See the Behavioral changes for more information about the differences. Additionally, expect further breaking changes in future releases, particularly with components being removed in favor of more efficient alternatives.

Rebase on NiFi 2.0.0 M2

This upgrade offers access to the newest NiFi features and enhancements on the 2.x branch.

Python processors

One of the key features introduced in Apache NiFi 2 is native support for Python processors. This capability allows you to create custom processors using Python, enabling seamless integration of Python scripts into your dataflows. With each milestone release of NiFi 2, Python integration continues to evolve, providing developers with enhanced functionality, greater flexibility, and more powerful tools for building robust dataflows.

The below list shows the Python processors that are available in Flow Management 7.3.1.0 clusters using NiFi 2.

  • Bedrock
  • ChunkData
  • ChunkDocument
  • EmbedData
  • InsertToMilvus
  • LexicalQueryMilvus
  • ParseDocument
  • PartitionCsv
  • PartitionDocx
  • PartitionHtml
  • PartitionPdf
  • PartitionText
  • PromptChatGPT
  • PutChroma
  • PutOpenSearchVector
  • PutPinecone
  • PutQdrant
  • QueryChroma
  • QueryOpenSearchVector
  • QueryPinecone
  • QueryQdrant
  • VectorQueryMilvus

For a comprehensive list of supported NiFi components in Cloudera 7.3.1.0 Flow Management Data Hub clusters, see Supported NiFi extensions.

For more information about the latest updates in Flow Management Data Hub clusters using NiFi 1, see What's new in Flow Management with NiFi 1.