Behavioral Changes in Flow Management

Review the list of Flow Management behavioral changes in Cloudera DataFlow for Data Hub 7.3.1.

Flow Management with NiFi 1

Secure communication between NiFi and ZooKeeper configured by default

If both ZooKeeper and NiFi services are secured, NiFi communication with ZooKeeper will be automatically configured as secured (TLS) using a new port, 2182. If you enforce TCP communication through a firewall and explicitly allow certain ports, you need to open them for port 2182.

If you do not want to use secure communication between ZooKeeper and NiFi, follow these steps to configure unsecured communication on port 2181:

  1. Update the ZooKeeper connection string:

    1. In Cloudera Manager, navigate to NiFi > Configuration.
    2. Set nifi.zookeeper.connect.string by replacing ${ZK_QUORUM} with the unsecure ZK QUORUM string, which has port 2181.
    To find your ZooKeeper quorum string from a NiFi node, run the following command as root:
    NIFI_PROC=$(ls -td /var/run/cloudera-scm-agent/process/NIFI/ | head -1); grep "Connect String" $NIFI_PROC/state-management.xml | cut -d\> -f2 | cut -d\< -f1; unset NIFI_PROC
    This command will provide your connect string. For example:
    host1:2181,host2:2181,host3:2181
  2. Add a safety valve for staging/state-management.xml in Cloudera Manager with the following property:

    • Name: xml.state-management.cluster-provider.zk-provider.property.Connect String
    • Value: <YOUR ZOOKEEPER CONNECT STRING>
  3. After upgrading to version 2.1.7, uncheck the nifi.zookeeper.client.secure option in Cloudera Manager.

ScriptedTransformRecord processor requires proper schema name attribute for record writer

NiFi-11523 introduced a fix that ensures the ScriptedTransformRecord processor uses the correct schema defined for the record writer. Previously, if the schema name attribute was set in the writer but not in the flow, it was ignored, defaulting to the reader schema. This behavior has been corrected, which may cause the processor to fail after upgrading if the schema name attribute is not set in the flow.

The failure is typically logged as:
org.apache.nifi.schema.access.SchemaNotFoundexception: ${schema.name} did not provide appropriate Schema Name

To prevent failures, ensure that the schema name attribute is properly configured in the flow or match it to the schema defined for the record reader for identical behavior.

Flow Management with NiFi 2

NiFi 2.0 introduces a lot of significant changes and enhancements, including some breaking changes for Flow Management clusters based on NiFi 2.X. It is important to familiarize yourself with the following points before migrating your existing flows.

If you want to migrate a data flow, you need to export the process group as a JSON file from your NiFi 1.x cluster and import this JSON file into your NiFi 2.X cluster. Tooling to help with upgrades and automatically manage the breaking changes will be provided in an upcoming Flow Management release.

Java 21

Java 21 is the minimum Java version required with NiFi 2.0. This version is automatically installed and configured on new Data Hub clusters using NiFi 2.0.

Templates and XML flow definitions
The concept of templates in NiFi has been deprecated. Instead, versioning flows should be managed using the DataFlow Catalog and/or the NiFi Registry. It is highly recommended to handle any existing templates in your NiFi 1.x clusters by:
  • Versioning the templates into the desired registry (DataFlow Catalog, NiFi Registry)
  • Deleting the templates from NiFi process groups

Additionally, flow.xml.gz no longer exists, only flow.json.gz can be used in NiFi clusters for defining flows in the canvas.

Custom components / NARs

Although not certain, it is very likely that a custom NAR designed for NiFi 1 will not be successfully loaded into NiFi 2. If your NiFi setup includes custom components or NARs, it is a requirement to update your dependencies to align with NiFi 2. This entails making the necessary adjustments and rebuilding your NARs using Java 21.

Variables are removed in favor of parameters

Variables and the variable registry have been removed from NiFi. Only Parameter Contexts and parameters should be used going forward. In future releases, tools will be provided to help with the conversion of variables to parameters. In the meantime, this conversion should be done manually when migrating flows to NiFi 2. Any variables left will simply be ignored when loading the flow definition.

Event driven thread pool no longer exists

The event driven thread pool has been removed, leaving only the time driven thread pool available. Any components previously configured using the event driven scheduling strategy should be switched to the time driven scheduling strategy.

Removed languages in scripted components

In NiFi 2.0, support for certain languages in scripted components has been removed. The affected languages are: ECMAScript, Lua, Ruby, and Python. It is recommended to switch to Groovy or to leverage the new Python API feature for developing processors.

Removed components and replacement options

The following list contains the list of the components that have been removed between clusters based on NiFi 1.28 and clusters based on NiFi 2.0, along with the recommended alternatives where available.

  • Processors
    • Base64EncodeContent => EncodeContent
    • CompareFuzzyHash => no replacement
    • ConsumeEWS => no replacement
    • ConsumeKafka_1_0 => ConsumeKafka_2_6
    • ConsumeKafka_2_0 => ConsumeKafka_2_6
    • ConsumeKafkaRecord_1_0 => ConsumeKafkaRecord_2_6
    • ConsumeKafkaRecord_2_0 => ConsumeKafkaRecord_2_6
    • ConvertAvroSchema => ConvertRecord
    • ConvertAvroToORC => no replacement
    • ConvertCSVToAvro => ConvertRecord
    • ConvertExcelToCSVProcessor => ConvertRecord with ExcelReader
    • ConvertJSONToAvro => ConvertRecord
    • CryptographicHashAttribute => UpdateAttribute
    • DeleteAzureBlobStorage => DeleteAzureBlobStorage_v12
    • DeleteRethinkDB => no replacement
    • EncryptContent => EncryptContentAge or EncryptContentPGP
    • ExecuteInfluxDBQuery => use Influx Data NARs for NiFi
    • ExtractCCDAAttributes => no replacement
    • FetchAzureBlobStorage => FetchAzureBlobStorage_v12
    • FetchElasticsearchHttp => GetElasticsearch
    • FuzzyHashContent => no replacement
    • GetAzureQueueStorage => GetAzureQueueStorage_v12
    • GetHTMLElement => no replacement
    • GetHTTP => InvokeHTTP
    • GetIgniteCache => no replacement
    • GetJMSQueue => ConsumeJMS
    • GetJMSTopic => ConsumeJMS
    • GetRethinkDB => no replacement
    • GetTCP => no replacement
    • GetTwitter => ConsumeTwitter
    • HashAttribute => CryptographicHashAttribute
    • HashContent => CryptographicHashContent
    • InferAvroSchema => ExtractRecordSchema
    • ListAzureBlobStorage => ListAzureBlobStorage_v12
    • ModifyHTMLElement => no replacement
    • PostHTTP => InvokeHTTP
    • PostSlack => PublishSlack
    • PublishKafka_1_0 => PublishKafka_2_6
    • PublishKafka_2_0 => PublishKafka_2_6
    • PublishKafkaRecord_1_0 => PublishKafkaRecord_2_6
    • PublishKafkaRecord_2_0 => PublishKafkaRecord_2_6
    • PutAzureBlobStorage => PutAzureBlobStorage_v12
    • PutAzureQueueStorage => PutAzureQueueStorage_v12
    • PutBigQueryBatch => PutBigQuery
    • PutBigQueryStreaming => PutBigQuery
    • PutElasticsearchHttp => PutElasticsearchJson
    • PutElasticsearchHttpRecord => PutElasticsearchRecord
    • PutHiveQL => PutClouderaHiveQL
    • PutHiveStreaming => PutClouderaHiveStreaming
    • PutHTMLElement => no replacement
    • PutIgniteCache => no replacement
    • PutInfluxDB => use Influx Data NARs for NiFi
    • PutJMS => PublishJMS
    • PutRethinkDB => no replacement
    • PutRiemann => no replacement
    • PutSlack => PublishSlack
    • QueryElasticsearchHttp => PaginatedJsonQueryElasticsearch
    • ScrollElasticsearchHttp => SearchElasticsearch
    • SelectHiveQL => SelectClouderaHiveQL
    • SpringContextProcessor => no replacement
    • StoreInKiteDataset => no replacement
    • UpdateHiveTable => UpdateClouderaHiveTable
  • Controller services
    • ActionHandlerLookup => no replacement
    • AlertHandler => no replacement
    • AzureStorageCredentialsControllerService => AzureStorageCredentialsControllerService_v12
    • AzureStorageCredentialsControllerServiceLookup => AzureStorageCredentialsControllerServiceLookup_v12
    • AzureStorageEmulatorCredentialsControllerService => no replacement
    • EasyRulesEngineProvider => no replacement
    • EasyRulesEngineService => no replacement
    • ExpressionHandler => no replacement
    • GraphiteMetricReporterService => no replacement
    • GremlinClientService => no replacement
    • HBase_1_1_2_ClientMapCacheService => HBase_2_ClientMapCacheService
    • HBase_1_1_2_ClientService => HBase_2_ClientService
    • HBase_1_1_2_ListLookupService => no replacement
    • HBase_1_1_2_RecordLookupService => HBase_2_RecordLookupService
    • HiveConnectionPool => ClouderaHiveConnectionPool
    • HortonworksSchemaRegistry => ClouderaSchemaRegistry
    • KafkaRecordSink_1_0 => KafkaRecordSink_2_6
    • KafkaRecordSink_2_0 => KafkaRecordSink_2_6
    • LogHandler => no replacement
    • OAuth2TokenProviderImpl => StandardOauth2AccessTokenProvider
    • OpenCypherClientService => no replacement
    • RecordSinkHandler => no replacement
    • ScriptedActionHandler => no replacement
    • ScriptedRulesEngine => no replacement
  • Reporting tasks
    • AmbariReportingTask => no replacement
    • MetricsEventReportingTask => no replacement
    • MetricsReportingTask => no replacement
  • Components with new coordinates
    • InvokeGRPC => moved into nifi-cdf-grpc-nar
    • ListenGRPC => moved into nifi-cdf-grpc-nar
    • KerberosKeytabUserService => moved into nifi-kerberos-user-service-nar
    • KerberosPasswordUserService => moved into nifi-kerberos-user-service-nar
    • KerberosTicketCacheUserService => moved into nifi-kerberos-user-service-nar
    Tooling will be provided in upcoming releases to automatically handle these changes. Currently, two options are available:
    • Manually edit the flow.json.gz file to update the coordinates of the impacted components.
    • Make the changes after the flow is imported in NiFi 2.0 by replacing the ghost components with the new implementations for each instance of the components listed above.
  • Pulsar components
    All Pulsar components have been temporarily removed. They will be reintroduced in an upcoming release. In the meantime, you can download the NARs from a public Maven repository and deploy them as custom NARs.