General

Developer

Processors
- AttributeRollingWindow 2.3.0.4.10.0.0-147
- AttributesToCSV 2.3.0.4.10.0.0-147
- AttributesToJSON 2.3.0.4.10.0.0-147
- CalculateParquetOffsets 2.3.0.4.10.0.0-147
- CalculateParquetRowGroupOffsets 2.3.0.4.10.0.0-147
- CalculateRecordStats 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumDB2 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumMongoDB 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumMySQL 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumOracle 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumPostgreSQL 2.3.0.4.10.0.0-147
- CaptureChangeDebeziumSQLServer 2.3.0.4.10.0.0-147
- CaptureChangeMySQL 2.3.0.4.10.0.0-147
- CompressContent 2.3.0.4.10.0.0-147
- ConnectWebSocket 2.3.0.4.10.0.0-147
- ConsumeAMQP 2.3.0.4.10.0.0-147
- ConsumeAzureEventHub 2.3.0.4.10.0.0-147
- ConsumeBoxEnterpriseEvents 2.3.0.4.10.0.0-147
- ConsumeBoxEvents 2.3.0.4.10.0.0-147
- ConsumeElasticsearch 2.3.0.4.10.0.0-147
- ConsumeGCPubSub 2.3.0.4.10.0.0-147
- ConsumeIMAP 2.3.0.4.10.0.0-147
- ConsumeJMS 2.3.0.4.10.0.0-147
- ConsumeKafka 2.3.0.4.10.0.0-147
- ConsumeKafka_2_6 2.3.0.4.10.0.0-147
- ConsumeKafka2CDP 2.3.0.4.10.0.0-147
- ConsumeKafka2RecordCDP 2.3.0.4.10.0.0-147
- ConsumeKafkaRecord_2_6 2.3.0.4.10.0.0-147
- ConsumeKinesisStream 2.3.0.4.10.0.0-147
- ConsumeMQTT 2.3.0.4.10.0.0-147
- ConsumePLC 2.3.0.4.10.0.0-147
- ConsumePOP3 2.3.0.4.10.0.0-147
- ConsumeSlack 2.3.0.4.10.0.0-147
- ConsumeTwitter 2.3.0.4.10.0.0-147
- ConsumeWindowsEventLog 2.3.0.4.10.0.0-147
- ControlRate 2.3.0.4.10.0.0-147
- ConvertAvroToParquet 2.3.0.4.10.0.0-147
- ConvertCharacterSet 2.3.0.4.10.0.0-147
- ConvertProtobuf 2.3.0.4.10.0.0-147
- ConvertRecord 2.3.0.4.10.0.0-147
- CopyAzureBlobStorage_v12 2.3.0.4.10.0.0-147
- CopyS3Object 2.3.0.4.10.0.0-147
- CountText 2.3.0.4.10.0.0-147
- CreateHadoopSequenceFile 2.3.0.4.10.0.0-147
- CryptographicHashContent 2.3.0.4.10.0.0-147
- DebugFlow 2.3.0.4.10.0.0-147
- DecryptContentAge 2.3.0.4.10.0.0-147
- DecryptContentPGP 2.3.0.4.10.0.0-147
- DeduplicateRecord 2.3.0.4.10.0.0-147
- DeleteAzureBlobStorage_v12 2.3.0.4.10.0.0-147
- DeleteAzureDataLakeStorage 2.3.0.4.10.0.0-147
- DeleteByQueryElasticsearch 2.3.0.4.10.0.0-147
- DeleteCDPObjectStore 2.3.0.4.10.0.0-147
- DeleteDynamoDB 2.3.0.4.10.0.0-147
- DeleteFile 2.3.0.4.10.0.0-147
- DeleteGCSObject 2.3.0.4.10.0.0-147
- DeleteGridFS 2.3.0.4.10.0.0-147
- DeleteHBaseCells 2.3.0.4.10.0.0-147
- DeleteHBaseRow 2.3.0.4.10.0.0-147
- DeleteHDFS 2.3.0.4.10.0.0-147
- DeleteMongo 2.3.0.4.10.0.0-147
- DeleteS3Object 2.3.0.4.10.0.0-147
- DeleteSFTP 2.3.0.4.10.0.0-147
- DeleteSQS 2.3.0.4.10.0.0-147
- DetectDuplicate 2.3.0.4.10.0.0-147
- DistributeLoad 2.3.0.4.10.0.0-147
- DuplicateFlowFile 2.3.0.4.10.0.0-147
- EncodeContent 2.3.0.4.10.0.0-147
- EncryptContentAge 2.3.0.4.10.0.0-147
- EncryptContentPGP 2.3.0.4.10.0.0-147
- EnforceOrder 2.3.0.4.10.0.0-147
- EvaluateJsonPath 2.3.0.4.10.0.0-147
- EvaluateXPath 2.3.0.4.10.0.0-147
- EvaluateXQuery 2.3.0.4.10.0.0-147
- ExecuteGraphQuery 2.3.0.4.10.0.0-147
- ExecuteGraphQueryRecord 2.3.0.4.10.0.0-147
- ExecuteGroovyScript 2.3.0.4.10.0.0-147
- ExecuteProcess 2.3.0.4.10.0.0-147
- ExecuteScript 2.3.0.4.10.0.0-147
- ExecuteSparkInteractive 2.3.0.4.10.0.0-147
- ExecuteSQL 2.3.0.4.10.0.0-147
- ExecuteSQLRecord 2.3.0.4.10.0.0-147
- ExecuteStreamCommand 2.3.0.4.10.0.0-147
- ExtractAvroMetadata 2.3.0.4.10.0.0-147
- ExtractDocumentText 2.3.0.4.10.0.0-147
- ExtractEmailAttachments 2.3.0.4.10.0.0-147
- ExtractEmailHeaders 2.3.0.4.10.0.0-147
- ExtractGrok 2.3.0.4.10.0.0-147
- ExtractHL7Attributes 2.3.0.4.10.0.0-147
- ExtractImageMetadata 2.3.0.4.10.0.0-147
- ExtractMediaMetadata 2.3.0.4.10.0.0-147
- ExtractRecordSchema 2.3.0.4.10.0.0-147
- ExtractText 2.3.0.4.10.0.0-147
- FetchAzureBlobStorage_v12 2.3.0.4.10.0.0-147
- FetchAzureDataLakeStorage 2.3.0.4.10.0.0-147
- FetchBoxFile 2.3.0.4.10.0.0-147
- FetchBoxFileInfo 2.3.0.4.10.0.0-147
- FetchBoxFileRepresentation 2.3.0.4.10.0.0-147
- FetchCDPObjectStore 2.3.0.4.10.0.0-147
- FetchDistributedMapCache 2.3.0.4.10.0.0-147
- FetchDropbox 2.3.0.4.10.0.0-147
- FetchFile 2.3.0.4.10.0.0-147
- FetchFTP 2.3.0.4.10.0.0-147
- FetchGCSObject 2.3.0.4.10.0.0-147
- FetchGoogleDrive 2.3.0.4.10.0.0-147
- FetchGridFS 2.3.0.4.10.0.0-147
- FetchHBaseRow 2.3.0.4.10.0.0-147
- FetchHDFS 2.3.0.4.10.0.0-147
- FetchParquet 2.3.0.4.10.0.0-147
- FetchPLC 2.3.0.4.10.0.0-147
- FetchS3Object 2.3.0.4.10.0.0-147
- FetchSFTP 2.3.0.4.10.0.0-147
- FetchSmb 2.3.0.4.10.0.0-147
- FilterAttribute 2.3.0.4.10.0.0-147
- FlattenJson 2.3.0.4.10.0.0-147
- ForkEnrichment 2.3.0.4.10.0.0-147
- ForkRecord 2.3.0.4.10.0.0-147
- GenerateFlowFile 2.3.0.4.10.0.0-147
- GenerateRecord 2.3.0.4.10.0.0-147
- GenerateTableFetch 2.3.0.4.10.0.0-147
- GeoEnrichIP 2.3.0.4.10.0.0-147
- GeoEnrichIPRecord 2.3.0.4.10.0.0-147
- GeohashRecord 2.3.0.4.10.0.0-147
- GetAsanaObject 2.3.0.4.10.0.0-147
- GetAwsPollyJobStatus 2.3.0.4.10.0.0-147
- GetAwsTextractJobStatus 2.3.0.4.10.0.0-147
- GetAwsTranscribeJobStatus 2.3.0.4.10.0.0-147
- GetAwsTranslateJobStatus 2.3.0.4.10.0.0-147
- GetAzureEventHub 2.3.0.4.10.0.0-147
- GetAzureQueueStorage_v12 2.3.0.4.10.0.0-147
- GetBoxFileCollaborators 2.3.0.4.10.0.0-147
- GetBoxGroupMembers 2.3.0.4.10.0.0-147
- GetCouchbaseKey 2.3.0.4.10.0.0-147
- GetDynamoDB 2.3.0.4.10.0.0-147
- GetElasticsearch 2.3.0.4.10.0.0-147
- GetFile 2.3.0.4.10.0.0-147
- GetFileResource 2.3.0.4.10.0.0-147
- GetFTP 2.3.0.4.10.0.0-147
- GetGcpVisionAnnotateFilesOperationStatus 2.3.0.4.10.0.0-147
- GetGcpVisionAnnotateImagesOperationStatus 2.3.0.4.10.0.0-147
- GetHBase 2.3.0.4.10.0.0-147
- GetHDFS 2.3.0.4.10.0.0-147
- GetHDFSEvents 2.3.0.4.10.0.0-147
- GetHDFSFileInfo 2.3.0.4.10.0.0-147
- GetHDFSSequenceFile 2.3.0.4.10.0.0-147
- GetHubSpot 2.3.0.4.10.0.0-147
- GetJiraIssue 2.3.0.4.10.0.0-147
- GetMongo 2.3.0.4.10.0.0-147
- GetMongoRecord 2.3.0.4.10.0.0-147
- GetS3ObjectMetadata 2.3.0.4.10.0.0-147
- GetS3ObjectTags 2.3.0.4.10.0.0-147
- GetSFTP 2.3.0.4.10.0.0-147
- GetShopify 2.3.0.4.10.0.0-147
- GetSlackReaction 2.3.0.4.10.0.0-147
- GetSmbFile 2.3.0.4.10.0.0-147
- GetSNMP 2.3.0.4.10.0.0-147
- GetSnowflakeIngestStatus 2.3.0.4.10.0.0-147
- GetSolr 2.3.0.4.10.0.0-147
- GetSplunk 2.3.0.4.10.0.0-147
- GetSQS 2.3.0.4.10.0.0-147
- GetTCP 2.3.0.4.10.0.0-147
- GetWorkdayReport 2.3.0.4.10.0.0-147
- GetZendesk 2.3.0.4.10.0.0-147
- HandleHttpRequest 2.3.0.4.10.0.0-147
- HandleHttpResponse 2.3.0.4.10.0.0-147
- IdentifyMimeType 2.3.0.4.10.0.0-147
- InvokeGRPC 2.3.0.4.10.0.0-147
- InvokeHTTP 2.3.0.4.10.0.0-147
- InvokeScriptedProcessor 2.3.0.4.10.0.0-147
- ISPEnrichIP 2.3.0.4.10.0.0-147
- JoinEnrichment 2.3.0.4.10.0.0-147
- JoltTransformJSON 2.3.0.4.10.0.0-147
- JoltTransformRecord 2.3.0.4.10.0.0-147
- JSLTTransformJSON 2.3.0.4.10.0.0-147
- JsonQueryElasticsearch 2.3.0.4.10.0.0-147
- ListAzureBlobStorage_v12 2.3.0.4.10.0.0-147
- ListAzureDataLakeStorage 2.3.0.4.10.0.0-147
- ListBoxFile 2.3.0.4.10.0.0-147
- ListBoxFileInfo 2.3.0.4.10.0.0-147
- ListCDPObjectStore 2.3.0.4.10.0.0-147
- ListDatabaseTables 2.3.0.4.10.0.0-147
- ListDropbox 2.3.0.4.10.0.0-147
- ListenBeats 2.3.0.4.10.0.0-147
- ListenFTP 2.3.0.4.10.0.0-147
- ListenGRPC 2.3.0.4.10.0.0-147
- ListenHTTP 2.3.0.4.10.0.0-147
- ListenNetFlow 2.3.0.4.10.0.0-147
- ListenOTLP 2.3.0.4.10.0.0-147
- ListenSlack 2.3.0.4.10.0.0-147
- ListenSyslog 2.3.0.4.10.0.0-147
- ListenTCP 2.3.0.4.10.0.0-147
- ListenTrapSNMP 2.3.0.4.10.0.0-147
- ListenUDP 2.3.0.4.10.0.0-147
- ListenUDPRecord 2.3.0.4.10.0.0-147
- ListenWebSocket 2.3.0.4.10.0.0-147
- ListFile 2.3.0.4.10.0.0-147
- ListFTP 2.3.0.4.10.0.0-147
- ListGCSBucket 2.3.0.4.10.0.0-147
- ListGoogleDrive 2.3.0.4.10.0.0-147
- ListHBaseRegions 2.3.0.4.10.0.0-147
- ListHDFS 2.3.0.4.10.0.0-147
- ListS3 2.3.0.4.10.0.0-147
- ListSFTP 2.3.0.4.10.0.0-147
- ListSmb 2.3.0.4.10.0.0-147
- LogAttribute 2.3.0.4.10.0.0-147
- LogMessage 2.3.0.4.10.0.0-147
- LookupAttribute 2.3.0.4.10.0.0-147
- LookupRecord 2.3.0.4.10.0.0-147
- MergeContent 2.3.0.4.10.0.0-147
- MergeRecord 2.3.0.4.10.0.0-147
- ModifyBytes 2.3.0.4.10.0.0-147
- ModifyCompression 2.3.0.4.10.0.0-147
- MonitorActivity 2.3.0.4.10.0.0-147
- MoveAzureDataLakeStorage 2.3.0.4.10.0.0-147
- MoveHDFS 2.3.0.4.10.0.0-147
- Notify 2.3.0.4.10.0.0-147
- PackageFlowFile 2.3.0.4.10.0.0-147
- PaginatedJsonQueryElasticsearch 2.3.0.4.10.0.0-147
- ParseEvtx 2.3.0.4.10.0.0-147
- ParseNetflowv5 2.3.0.4.10.0.0-147
- ParseSyslog 2.3.0.4.10.0.0-147
- ParseSyslog5424 2.3.0.4.10.0.0-147
- PartitionRecord 2.3.0.4.10.0.0-147
- PublishAMQP 2.3.0.4.10.0.0-147
- PublishGCPubSub 2.3.0.4.10.0.0-147
- PublishJMS 2.3.0.4.10.0.0-147
- PublishKafka 2.3.0.4.10.0.0-147
- PublishKafka_2_6 2.3.0.4.10.0.0-147
- PublishKafka2CDP 2.3.0.4.10.0.0-147
- PublishKafka2RecordCDP 2.3.0.4.10.0.0-147
- PublishKafkaRecord_2_6 2.3.0.4.10.0.0-147
- PublishMQTT 2.3.0.4.10.0.0-147
- PublishSlack 2.3.0.4.10.0.0-147
- PutAccumuloRecord 2.3.0.4.10.0.0-147
- PutAzureBlobStorage_v12 2.3.0.4.10.0.0-147
- PutAzureCosmosDBRecord 2.3.0.4.10.0.0-147
- PutAzureDataExplorer 2.3.0.4.10.0.0-147
- PutAzureDataLakeStorage 2.3.0.4.10.0.0-147
- PutAzureEventHub 2.3.0.4.10.0.0-147
- PutAzureQueueStorage_v12 2.3.0.4.10.0.0-147
- PutBigQuery 2.3.0.4.10.0.0-147
- PutBoxFile 2.3.0.4.10.0.0-147
- PutCassandraQL 2.3.0.4.10.0.0-147
- PutCassandraRecord 2.3.0.4.10.0.0-147
- PutCDPObjectStore 2.3.0.4.10.0.0-147
- PutClouderaHiveQL 2.3.0.4.10.0.0-147
- PutClouderaHiveStreaming 2.3.0.4.10.0.0-147
- PutClouderaORC 2.3.0.4.10.0.0-147
- PutCloudWatchMetric 2.3.0.4.10.0.0-147
- PutCouchbaseKey 2.3.0.4.10.0.0-147
- PutDatabaseRecord 2.3.0.4.10.0.0-147
- PutDistributedMapCache 2.3.0.4.10.0.0-147
- PutDropbox 2.3.0.4.10.0.0-147
- PutDynamoDB 2.3.0.4.10.0.0-147
- PutDynamoDBRecord 2.3.0.4.10.0.0-147
- PutElasticsearchJson 2.3.0.4.10.0.0-147
- PutElasticsearchRecord 2.3.0.4.10.0.0-147
- PutEmail 2.3.0.4.10.0.0-147
- PutFile 2.3.0.4.10.0.0-147
- PutFTP 2.3.0.4.10.0.0-147
- PutGCSObject 2.3.0.4.10.0.0-147
- PutGoogleDrive 2.3.0.4.10.0.0-147
- PutGridFS 2.3.0.4.10.0.0-147
- PutHBaseCell 2.3.0.4.10.0.0-147
- PutHBaseJSON 2.3.0.4.10.0.0-147
- PutHBaseRecord 2.3.0.4.10.0.0-147
- PutHDFS 2.3.0.4.10.0.0-147
- PutIceberg 2.3.0.4.10.0.0-147
- PutIcebergCDC 2.3.0.4.10.0.0-147
- PutIoTDBRecord 2.3.0.4.10.0.0-147
- PutJiraIssue 2.3.0.4.10.0.0-147
- PutKinesisFirehose 2.3.0.4.10.0.0-147
- PutKinesisStream 2.3.0.4.10.0.0-147
- PutKudu 2.3.0.4.10.0.0-147
- PutLambda 2.3.0.4.10.0.0-147
- PutMongo 2.3.0.4.10.0.0-147
- PutMongoBulkOperations 2.3.0.4.10.0.0-147
- PutMongoRecord 2.3.0.4.10.0.0-147
- PutParquet 2.3.0.4.10.0.0-147
- PutPLC 2.3.0.4.10.0.0-147
- PutRecord 2.3.0.4.10.0.0-147
- PutRedisHashRecord 2.3.0.4.10.0.0-147
- PutS3Object 2.3.0.4.10.0.0-147
- PutSalesforceObject 2.3.0.4.10.0.0-147
- PutSFTP 2.3.0.4.10.0.0-147
- PutSmbFile 2.3.0.4.10.0.0-147
- PutSnowflakeInternalStage 2.3.0.4.10.0.0-147
- PutSNS 2.3.0.4.10.0.0-147
- PutSolrContentStream 2.3.0.4.10.0.0-147
- PutSolrRecord 2.3.0.4.10.0.0-147
- PutSplunk 2.3.0.4.10.0.0-147
- PutSplunkHTTP 2.3.0.4.10.0.0-147
- PutSQL 2.3.0.4.10.0.0-147
- PutSQS 2.3.0.4.10.0.0-147
- PutSyslog 2.3.0.4.10.0.0-147
- PutTCP 2.3.0.4.10.0.0-147
- PutUDP 2.3.0.4.10.0.0-147
- PutWebSocket 2.3.0.4.10.0.0-147
- PutZendeskTicket 2.3.0.4.10.0.0-147
- QueryAirtableTable 2.3.0.4.10.0.0-147
- QueryAzureDataExplorer 2.3.0.4.10.0.0-147
- QueryCassandra 2.3.0.4.10.0.0-147
- QueryDatabaseTable 2.3.0.4.10.0.0-147
- QueryDatabaseTableRecord 2.3.0.4.10.0.0-147
- QueryIoTDBRecord 2.3.0.4.10.0.0-147
- QueryRecord 2.3.0.4.10.0.0-147
- QuerySalesforceObject 2.3.0.4.10.0.0-147
- QuerySolr 2.3.0.4.10.0.0-147
- QuerySplunkIndexingStatus 2.3.0.4.10.0.0-147
- RemoveRecordField 2.3.0.4.10.0.0-147
- RenameRecordField 2.3.0.4.10.0.0-147
- ReplaceText 2.3.0.4.10.0.0-147
- ReplaceTextWithMapping 2.3.0.4.10.0.0-147
- ResizeImage 2.3.0.4.10.0.0-147
- RetryFlowFile 2.3.0.4.10.0.0-147
- RouteHL7 2.3.0.4.10.0.0-147
- RouteOnAttribute 2.3.0.4.10.0.0-147
- RouteOnContent 2.3.0.4.10.0.0-147
- RouteText 2.3.0.4.10.0.0-147
- RunMongoAggregation 2.3.0.4.10.0.0-147
- SampleRecord 2.3.0.4.10.0.0-147
- SawmillTransformJSON 2.3.0.4.10.0.0-147
- SawmillTransformRecord 2.3.0.4.10.0.0-147
- ScanAccumulo 2.3.0.4.10.0.0-147
- ScanAttribute 2.3.0.4.10.0.0-147
- ScanContent 2.3.0.4.10.0.0-147
- ScanHBase 2.3.0.4.10.0.0-147
- ScriptedFilterRecord 2.3.0.4.10.0.0-147
- ScriptedPartitionRecord 2.3.0.4.10.0.0-147
- ScriptedTransformRecord 2.3.0.4.10.0.0-147
- ScriptedValidateRecord 2.3.0.4.10.0.0-147
- SearchElasticsearch 2.3.0.4.10.0.0-147
- SegmentContent 2.3.0.4.10.0.0-147
- SelectClouderaHiveQL 2.3.0.4.10.0.0-147
- SendTrapSNMP 2.3.0.4.10.0.0-147
- SetSNMP 2.3.0.4.10.0.0-147
- SignContentPGP 2.3.0.4.10.0.0-147
- SplitAvro 2.3.0.4.10.0.0-147
- SplitContent 2.3.0.4.10.0.0-147
- SplitExcel 2.3.0.4.10.0.0-147
- SplitJson 2.3.0.4.10.0.0-147
- SplitPCAP 2.3.0.4.10.0.0-147
- SplitRecord 2.3.0.4.10.0.0-147
- SplitText 2.3.0.4.10.0.0-147
- SplitXml 2.3.0.4.10.0.0-147
- StartAwsPollyJob 2.3.0.4.10.0.0-147
- StartAwsTextractJob 2.3.0.4.10.0.0-147
- StartAwsTranscribeJob 2.3.0.4.10.0.0-147
- StartAwsTranslateJob 2.3.0.4.10.0.0-147
- StartGcpVisionAnnotateFilesOperation 2.3.0.4.10.0.0-147
- StartGcpVisionAnnotateImagesOperation 2.3.0.4.10.0.0-147
- StartSnowflakeIngest 2.3.0.4.10.0.0-147
- TagS3Object 2.3.0.4.10.0.0-147
- TailFile 2.3.0.4.10.0.0-147
- TransformXml 2.3.0.4.10.0.0-147
- TriggerClouderaHiveMetaStoreEvent 2.3.0.4.10.0.0-147
- UnpackContent 2.3.0.4.10.0.0-147
- UpdateAttribute 2.3.0.4.10.0.0-147
- UpdateByQueryElasticsearch 2.3.0.4.10.0.0-147
- UpdateClouderaHiveTable 2.3.0.4.10.0.0-147
- UpdateCounter 2.3.0.4.10.0.0-147
- UpdateDatabaseTable 2.3.0.4.10.0.0-147
- UpdateDeltaLakeTable 2.3.0.4.10.0.0-147
- UpdateJiraIssue 2.3.0.4.10.0.0-147
- UpdateRecord 2.3.0.4.10.0.0-147
- ValidateCsv 2.3.0.4.10.0.0-147
- ValidateJson 2.3.0.4.10.0.0-147
- ValidateRecord 2.3.0.4.10.0.0-147
- ValidateXml 2.3.0.4.10.0.0-147
- VerifyContentMAC 2.3.0.4.10.0.0-147
- VerifyContentPGP 2.3.0.4.10.0.0-147
- Wait 2.3.0.4.10.0.0-147

Controller Services
- AccumuloService 2.3.0.4.10.0.0-147
- ActiveMQJMSConnectionFactoryProvider 2.3.0.4.10.0.0-147
- ADLSCredentialsControllerService 2.3.0.4.10.0.0-147
- ADLSCredentialsControllerServiceLookup 2.3.0.4.10.0.0-147
- ADLSIDBrokerCloudCredentialsProviderControllerService 2.3.0.4.10.0.0-147
- AmazonGlueSchemaRegistry 2.3.0.4.10.0.0-147
- ApicurioSchemaRegistry 2.3.0.4.10.0.0-147
- AvroReader 2.3.0.4.10.0.0-147
- AvroRecordSetWriter 2.3.0.4.10.0.0-147
- AvroSchemaRegistry 2.3.0.4.10.0.0-147
- AWSCredentialsProviderControllerService 2.3.0.4.10.0.0-147
- AWSIDBrokerCloudCredentialsProviderControllerService 2.3.0.4.10.0.0-147
- AzureBlobIDBrokerCloudCredentialsProviderControllerService 2.3.0.4.10.0.0-147
- AzureBlobStorageFileResourceService 2.3.0.4.10.0.0-147
- AzureCosmosDBClientService 2.3.0.4.10.0.0-147
- AzureDataLakeStorageFileResourceService 2.3.0.4.10.0.0-147
- AzureEventHubRecordSink 2.3.0.4.10.0.0-147
- AzureServiceBusJMSConnectionFactoryProvider 2.3.0.4.10.0.0-147
- AzureStorageCredentialsControllerService_v12 2.3.0.4.10.0.0-147
- AzureStorageCredentialsControllerServiceLookup_v12 2.3.0.4.10.0.0-147
- CassandraDistributedMapCache 2.3.0.4.10.0.0-147
- CassandraSessionProvider 2.3.0.4.10.0.0-147
- CdpCredentialsProviderControllerService 2.3.0.4.10.0.0-147
- CdpOauth2AccessTokenProviderControllerService 2.3.0.4.10.0.0-147
- CEFReader 2.3.0.4.10.0.0-147
- CiscoEmblemSyslogMessageReader 2.3.0.4.10.0.0-147
- ClouderaAttributeSchemaReferenceReader 2.3.0.4.10.0.0-147
- ClouderaAttributeSchemaReferenceWriter 2.3.0.4.10.0.0-147
- ClouderaEncodedSchemaReferenceReader 2.3.0.4.10.0.0-147
- ClouderaEncodedSchemaReferenceWriter 2.3.0.4.10.0.0-147
- ClouderaHiveConnectionPool 2.3.0.4.10.0.0-147
- ClouderaSchemaRegistry 2.3.0.4.10.0.0-147
- CMLLookupService 2.3.0.4.10.0.0-147
- ConfluentEncodedSchemaReferenceReader 2.3.0.4.10.0.0-147
- ConfluentEncodedSchemaReferenceWriter 2.3.0.4.10.0.0-147
- ConfluentSchemaRegistry 2.3.0.4.10.0.0-147
- CouchbaseClusterService 2.3.0.4.10.0.0-147
- CouchbaseKeyValueLookupService 2.3.0.4.10.0.0-147
- CouchbaseMapCacheClient 2.3.0.4.10.0.0-147
- CouchbaseRecordLookupService 2.3.0.4.10.0.0-147
- CSVReader 2.3.0.4.10.0.0-147
- CSVRecordLookupService 2.3.0.4.10.0.0-147
- CSVRecordSetWriter 2.3.0.4.10.0.0-147
- DatabaseRecordLookupService 2.3.0.4.10.0.0-147
- DatabaseRecordSink 2.3.0.4.10.0.0-147
- DatabaseTableSchemaRegistry 2.3.0.4.10.0.0-147
- DBCPConnectionPool 2.3.0.4.10.0.0-147
- DBCPConnectionPoolLookup 2.3.0.4.10.0.0-147
- DeveloperBoxClientService 2.3.0.4.10.0.0-147
- DistributedMapCacheLookupService 2.3.0.4.10.0.0-147
- EBCDICRecordReader 2.3.0.4.10.0.0-147
- ElasticSearchClientServiceImpl 2.3.0.4.10.0.0-147
- ElasticSearchLookupService 2.3.0.4.10.0.0-147
- ElasticSearchStringLookupService 2.3.0.4.10.0.0-147
- EmailRecordSink 2.3.0.4.10.0.0-147
- EmbeddedHazelcastCacheManager 2.3.0.4.10.0.0-147
- ExcelReader 2.3.0.4.10.0.0-147
- ExternalHazelcastCacheManager 2.3.0.4.10.0.0-147
- FreeFormTextRecordSetWriter 2.3.0.4.10.0.0-147
- GCPCredentialsControllerService 2.3.0.4.10.0.0-147
- GCSFileResourceService 2.3.0.4.10.0.0-147
- GenericPLC4XConnectionPool 2.3.0.4.10.0.0-147
- GrokReader 2.3.0.4.10.0.0-147
- HadoopCatalogService 2.3.0.4.10.0.0-147
- HadoopDBCPConnectionPool 2.3.0.4.10.0.0-147
- HazelcastMapCacheClient 2.3.0.4.10.0.0-147
- HBase_2_ClientMapCacheService 2.3.0.4.10.0.0-147
- HBase_2_ClientService 2.3.0.4.10.0.0-147
- HBase_2_RecordLookupService 2.3.0.4.10.0.0-147
- HikariCPConnectionPool 2.3.0.4.10.0.0-147
- HiveCatalogService 2.3.0.4.10.0.0-147
- HttpRecordSink 2.3.0.4.10.0.0-147
- ImpalaConnectionPool 2.3.0.4.10.0.0-147
- IPFIXReader 2.3.0.4.10.0.0-147
- IPLookupService 2.3.0.4.10.0.0-147
- JASN1Reader 2.3.0.4.10.0.0-147
- JdbcCatalogService 2.3.0.4.10.0.0-147
- JettyWebSocketClient 2.3.0.4.10.0.0-147
- JettyWebSocketServer 2.3.0.4.10.0.0-147
- JiraRecordSink 2.3.0.4.10.0.0-147
- JMSConnectionFactoryProvider 2.3.0.4.10.0.0-147
- JndiJmsConnectionFactoryProvider 2.3.0.4.10.0.0-147
- JsonConfigBasedBoxClientService 2.3.0.4.10.0.0-147
- JsonPathReader 2.3.0.4.10.0.0-147
- JsonRecordSetWriter 2.3.0.4.10.0.0-147
- JsonTreeReader 2.3.0.4.10.0.0-147
- Kafka3ConnectionService 2.3.0.4.10.0.0-147
- KafkaRecordSink_2_6 2.3.0.4.10.0.0-147
- KerberosKeytabUserService 2.3.0.4.10.0.0-147
- KerberosPasswordUserService 2.3.0.4.10.0.0-147
- KerberosTicketCacheUserService 2.3.0.4.10.0.0-147
- KuduLookupService 2.3.0.4.10.0.0-147
- LivySessionController 2.3.0.4.10.0.0-147
- LoggingRecordSink 2.3.0.4.10.0.0-147
- MapCacheClientService 2.3.0.4.10.0.0-147
- MapCacheServer 2.3.0.4.10.0.0-147
- MongoDBControllerService 2.3.0.4.10.0.0-147
- MongoDBLookupService 2.3.0.4.10.0.0-147
- Neo4JCypherClientService 2.3.0.4.10.0.0-147
- ParquetReader 2.3.0.4.10.0.0-147
- ParquetRecordSetWriter 2.3.0.4.10.0.0-147
- PEMEncodedSSLContextProvider 2.3.0.4.10.0.0-147
- PhoenixThickConnectionPool 2.3.0.4.10.0.0-147
- PhoenixThinConnectionPool 2.3.0.4.10.0.0-147
- PostgreSQLConnectionPool 2.3.0.4.10.0.0-147
- PropertiesFileLookupService 2.3.0.4.10.0.0-147
- ProtobufReader 2.3.0.4.10.0.0-147
- ProxyPLC4XConnectionPool 2.3.0.4.10.0.0-147
- RabbitMQJMSConnectionFactoryProvider 2.3.0.4.10.0.0-147
- ReaderLookup 2.3.0.4.10.0.0-147
- RecordSetWriterLookup 2.3.0.4.10.0.0-147
- RecordSinkServiceLookup 2.3.0.4.10.0.0-147
- RedisConnectionPoolService 2.3.0.4.10.0.0-147
- RedisDistributedMapCacheClientService 2.3.0.4.10.0.0-147
- RedshiftConnectionPool 2.3.0.4.10.0.0-147
- RESTCatalogService 2.3.0.4.10.0.0-147
- RestLookupService 2.3.0.4.10.0.0-147
- S3FileResourceService 2.3.0.4.10.0.0-147
- ScriptedLookupService 2.3.0.4.10.0.0-147
- ScriptedReader 2.3.0.4.10.0.0-147
- ScriptedRecordSetWriter 2.3.0.4.10.0.0-147
- ScriptedRecordSink 2.3.0.4.10.0.0-147
- SetCacheClientService 2.3.0.4.10.0.0-147
- SetCacheServer 2.3.0.4.10.0.0-147
- SimpleCsvFileLookupService 2.3.0.4.10.0.0-147
- SimpleDatabaseLookupService 2.3.0.4.10.0.0-147
- SimpleKeyValueLookupService 2.3.0.4.10.0.0-147
- SimpleRedisDistributedMapCacheClientService 2.3.0.4.10.0.0-147
- SimpleScriptedLookupService 2.3.0.4.10.0.0-147
- SiteToSiteReportingRecordSink 2.3.0.4.10.0.0-147
- SlackRecordSink 2.3.0.4.10.0.0-147
- SmbjClientProviderService 2.3.0.4.10.0.0-147
- SnowflakeComputingConnectionPool 2.3.0.4.10.0.0-147
- StandardAsanaClientProviderService 2.3.0.4.10.0.0-147
- StandardAzureCredentialsControllerService 2.3.0.4.10.0.0-147
- StandardDatabaseDialectService 2.3.0.4.10.0.0-147
- StandardDropboxCredentialService 2.3.0.4.10.0.0-147
- StandardFileResourceService 2.3.0.4.10.0.0-147
- StandardHashiCorpVaultClientService 2.3.0.4.10.0.0-147
- StandardHttpContextMap 2.3.0.4.10.0.0-147
- StandardJiraCredentialService 2.3.0.4.10.0.0-147
- StandardJsonSchemaRegistry 2.3.0.4.10.0.0-147
- StandardKustoIngestService 2.3.0.4.10.0.0-147
- StandardKustoQueryService 2.3.0.4.10.0.0-147
- StandardOauth2AccessTokenProvider 2.3.0.4.10.0.0-147
- StandardPGPPrivateKeyService 2.3.0.4.10.0.0-147
- StandardPGPPublicKeyService 2.3.0.4.10.0.0-147
- StandardPLC4XConnectionPool 2.3.0.4.10.0.0-147
- StandardPrivateKeyService 2.3.0.4.10.0.0-147
- StandardProxyConfigurationService 2.3.0.4.10.0.0-147
- StandardRestrictedSSLContextService 2.3.0.4.10.0.0-147
- StandardS3EncryptionService 2.3.0.4.10.0.0-147
- StandardSnowflakeIngestManagerProviderService 2.3.0.4.10.0.0-147
- StandardSSLContextService 2.3.0.4.10.0.0-147
- StandardWebClientServiceProvider 2.3.0.4.10.0.0-147
- Syslog5424Reader 2.3.0.4.10.0.0-147
- SyslogReader 2.3.0.4.10.0.0-147
- TinkerpopClientService 2.3.0.4.10.0.0-147
- UDPEventRecordSink 2.3.0.4.10.0.0-147
- VolatileSchemaCache 2.3.0.4.10.0.0-147
- WindowsEventLogReader 2.3.0.4.10.0.0-147
- XMLFileLookupService 2.3.0.4.10.0.0-147
- XMLReader 2.3.0.4.10.0.0-147
- XMLRecordSetWriter 2.3.0.4.10.0.0-147
- YamlTreeReader 2.3.0.4.10.0.0-147
- ZendeskRecordSink 2.3.0.4.10.0.0-147

Reporting Tasks
- AzureLogAnalyticsProvenanceReportingTask 2.3.0.4.10.0.0-147
- AzureLogAnalyticsReportingTask 2.3.0.4.10.0.0-147
- ControllerStatusReportingTask 2.3.0.4.10.0.0-147
- MonitorDiskUsage 2.3.0.4.10.0.0-147
- MonitorMemory 2.3.0.4.10.0.0-147
- QueryNiFiReportingTask 2.3.0.4.10.0.0-147
- ReportLineageToAtlas 2.3.0.4.10.0.0-147
- ScriptedReportingTask 2.3.0.4.10.0.0-147
- SiteToSiteBulletinReportingTask 2.3.0.4.10.0.0-147
- SiteToSiteMetricsReportingTask 2.3.0.4.10.0.0-147
- SiteToSiteProvenanceReportingTask 2.3.0.4.10.0.0-147
- SiteToSiteStatusReportingTask 2.3.0.4.10.0.0-147

Parameter Providers
- AwsSecretsManagerParameterProvider 2.3.0.4.10.0.0-147
- AzureKeyVaultSecretsParameterProvider 2.3.0.4.10.0.0-147
- CyberArkConjurParameterProvider 2.3.0.4.10.0.0-147
- DatabaseParameterProvider 2.3.0.4.10.0.0-147
- EnvironmentVariableParameterProvider 2.3.0.4.10.0.0-147
- GcpSecretManagerParameterProvider 2.3.0.4.10.0.0-147
- HashiCorpVaultParameterProvider 2.3.0.4.10.0.0-147
- KubernetesSecretParameterProvider 2.3.0.4.10.0.0-147
- OnePasswordParameterProvider 2.3.0.4.10.0.0-147
- PropertiesFileParameterProvider 2.3.0.4.10.0.0-147

Flow Analysis Rules
- DisallowComponentType 2.3.0.4.10.0.0-147
- DisallowConsecutiveConnectionsWithRoundRobinLB 2.3.0.4.10.0.0-147
- DisallowDeadEnd 2.3.0.4.10.0.0-147
- DisallowDeprecatedProcessor 2.3.0.4.10.0.0-147
- DisallowExtractTextForFullContent 2.3.0.4.10.0.0-147
- RecommendRecordProcessor 2.3.0.4.10.0.0-147
- RequireHandleHttpResponseAfterHandleHttpRequest 2.3.0.4.10.0.0-147
- RequireMergeBeforePutIceberg 2.3.0.4.10.0.0-147
- RestrictBackpressureSettings 2.3.0.4.10.0.0-147
- RestrictComponentNaming 2.3.0.4.10.0.0-147
- RestrictConcurrentTasksVsThreadPoolSizeInProcessors 2.3.0.4.10.0.0-147
- RestrictFlowFileExpiration 2.3.0.4.10.0.0-147
- RestrictProcessorConcurrency 2.3.0.4.10.0.0-147
- RestrictSchedulingForListProcessors 2.3.0.4.10.0.0-147
- RestrictThreadPoolSize 2.3.0.4.10.0.0-147
- RestrictYieldDurationForConsumeKafkaProcessors 2.3.0.4.10.0.0-147

ReportLineageToAtlas 2.3.0.4.10.0.0-147

Bundle: org.apache.nifi | nifi-atlas-nar
Description: Report NiFi flow data set level lineage to Apache Atlas. End-to-end lineages across NiFi environments and other systems can be reported if those are connected by different protocols and data set, such as NiFi Site-to-Site, Kafka topic or Hive tables ... etc. Atlas lineage reported by this reporting task can be useful to grasp the high level relationships between processes and data sets, in addition to NiFi provenance events providing detailed event level lineage. See 'Additional Details' for further description and limitations.
Tags: atlas, lineage
Input Requirement
Supports Sensitive Dynamic Properties: false

Additional Details for ReportLineageToAtlas 2.3.0.4.10.0.0-147

ReportLineageToAtlas

Table of contents:

Information reported to Atlas
NiFi Atlas Types
Namespaces (formerly Cluster Name Resolution)
NiFi flow structure
- Path Separation Logic
NiFi data lineage
How it runs in NiFi cluster
Limitations
Atlas Server Configurations
Atlas Server Emulator

Information reported to Atlas

This reporting task stores two types of NiFi flow information, ‘NiFi flow structure’ and ‘NiFi data lineage’.

‘NiFi flow structure’ tells what components are running within a NiFi flow and how these are connected. It is reported by analyzing current NiFi flow structure, specifically NiFi component relationships.

‘NiFi data lineage’ tells what part of NiFi flow interacts with different DataSets such as HDFS files or Hive tables … etc. It is reported by analyzing NiFi provenance events.

Technically each information is sent using Atlas REST API v2 as shown in above image.

As both information types use the same NiFi Atlas Types and Namespaces concepts, it is recommended to start reading those sections first.

NiFi Atlas Types

This reporting task creates following NiFi specific types in Atlas Type system when it runs if these type definitions are not found.

Green boxes represent sub-types of DataSet and blue ones are sub-types of Process. Gray lines represent entity ownership. Red lines represent lineage.

nifi_flow Represents a NiFi data flow.

As shown in the above diagram, nifi_flow owns other nifi_component types. This owning relationship is defined by Atlas ‘owned’ constraint so that when a ’nifi_flow’ entity is removed, all owned NiFi component entities are removed in cascading manner.

When this reporting task runs, it analyzes and traverse the entire flow structure, and create NiFi component entities in Atlas. At later runs, it compares the current flow structure with the one stored in Atlas to figure out if any changes have been made since the last time the flow was reported. The reporting task updates NiFi component entities in Atlas if needed. NiFi components those are removed from a NiFi flow also get deleted from Atlas. However, those entities can still be seen in Atlas search results or lineage graphs since Atlas uses ‘Soft Delete’ by default. See Atlas Delete Handler for further detail.

Attributes:

qualifiedName: Root ProcessGroup ID@namespace (e.g. 86420a14-2fab-3e1e-4331-fb6ab42f58e0@ns1)
name: Name of the Root ProcessGroup.
url: URL of the NiFi instance. This can be specified via reporting task ‘NiFi URL for Atlas’ property.

nifi_flow_path Part of a NiFi data flow containing one or more processing NiFi components such as Processors and Remote Ports. The reporting task divides a NiFi flow into multiple flow paths. See Path Separation Logic for details.

Attributes:

qualifiedName: The first NiFi component Id in a path@namespace (e.g. 529e6722-9b49-3b66-9c94-00da9863ca2d@ns1)
name: NiFi component names within a path are concatenated (e.g. GenerateFlowFile, PutFile, LogAttribute)
url: A deep link to the first NiFi component in corresponding NiFi UI

nifi_input/output_port Represents a Remote Port which can be accessed by RemoteProcessGroup via Site-to-Site protocol.

Attributes:

qualifiedName: Port ID@namespace (e.g. 3f6d405e-6e3d-38c9-c5af-ce158f8e593d@ns1)
name: Name of the Port.

nifi_data Represents Unknown DataSets created by CREATE/SEND/RECEIVE NiFi provenance events those do not have particular provenance event analyzer.

Attributes:

qualifiedName: ID of a Processor which generated the provenance event@namespace (e.g. db8bb12c-5cd3-3011-c971-579f460ebedf@ns1)
name: Name of the Processor.

nifi_queue A internal DataSet of NiFi flows which connects nifi_flow_paths. Atlas lineage graph requires a DataSet in between Process entities.

Attributes:

qualifiedName: ID of the first Processor in the destination nifi_flow_path.
name: Name of the Processor.

Namespaces

An entity in Atlas can be identified by its GUID for any existing objects, or type name and unique attribute can be used if GUID is not known. Qualified name is commonly used as the unique attribute.

One Atlas instance can be used to manage multiple environments and objects in different environments may have the same name. For example, a Hive table ‘request_logs’ in two different clusters, ‘cluster-A’ and ‘cluster-B’. For this reason the qualified names contain a so-called metadata namespace.

It is common practice to provide the cluster name as the namespace, but it can be any arbitrary string.

With this, a qualified name has ‘componentId@namespace’ format. E.g. A Hive table qualified name would be dbName.tableName@namespace (default.request_logs@cluster-A).

From this NiFi reporting task standpoint, a namespace is needed to be resolved at following situations:

To register NiFi component entities. Which namespace should be used to represent the current NiFi environment?
To create lineages from NiFi component to other DataSets. Which environment does the DataSet resides in?

To answer such questions, ReportLineageToAtlas reporting task provides a way to define mappings from IP address or hostname to a namespace. The mapping can be defined by Dynamic Properties with a name in ‘hostnamePattern.namespace’ format, having its value as a set of Regular Expression Patterns to match IP addresses or host names to a particular namespace.

As an example, following mapping definition would resolve namespace ’namespace-A’ for IP address such as ‘192.168.30.123’ or hostname ’namenode1.a.example.com’, and ’namespace-B’ for ‘192.168.40.223’ or ’nifi3.b.example.com’.


# Dynamic Property Name for namespace-A
hostnamePattern.namespace-A
# Value can have multiple Regular Expression patterns separated by new line
192\.168\.30\.\d+
[^\.]+\.a\.example\.com

# Dynamic Property Name for namespace-B
hostnamePattern.namespace-B
# Values
192\.168\.40\.\d+
[^\.]+\.b\.example\.com

If no namespace mapping matches, then a name defined at ‘Atlas Default Metadata Namespace’ is used.

NiFi flow structure

This section describes how a structure of NiFi flow is reported to Atlas.

Path Separation Logic

To provide a meaningful lineage granularity in Atlas, this reporting task divide a NiFi flow into paths. The logic has the following concepts:

Focuses only on Processors and Remote Ports. Local Input/Output Ports, Process Group hierarchy or Funnels do not contribute path separation.

For example, following two flows are identical in path separation logic:

+ ```
Remote Input Port -> Processor 0 -> Funnel -> Processor 1 -> Local Input Port -> Processor 2
```
+ ```
Remote Input Port -> Processor 0 -> Processor 1 -> Processor 2
```Both flows will be treated as a single path that consists of Remote Input Port, Processor 0, 1 and 2\.

Any Processor with multiple incoming relationships from other Processors is treated like a ‘Common route’ or ‘Functional route’, and is managed as a separate path.

For example, following flow:

Processor 0 -> Processor 1 -> Processor 2
Processor 3 -> Processor2

Will produce following paths as result:

Processor 0, 1
Processor 2
Processor 3

Self cyclic relationships are ignored.

Based on these concepts, path separation is done by following steps:

Select starting components (Processors and Remote Ports) those do not have any input relationship from other Processors.
For each starting component, create a ’nifi_flow_path’. The same path may already exist if other path arrived here before.
Traverse outgoing relationships.
If any Processor with more than 1 incoming Processor relationships is found, then split the component as new ’nifi_flow_path’. When starting as a new path, a ’nifi-queue’ is created. The queue is added to the current path outputs, and the new path inputs. Back to step 2.
Traverse outgoing paths as long as there is one.

NiFi data lineage

This section describes how NiFi data lineage is reported to Atlas.

NiFi Lineage Strategy

To meet different use-cases, this reporting task provides ‘NiFi Lineage Strategy’ property to control how to report Atlas the DataSet and Process lineage tracked by NiFi flow.

NOTE: It is recommended to try possible options to see which strategy meets your use-case before running the reporting task at a production environment. Different strategies create entities differently, and if multiple strategies are used (or switched from one another), Atlas lineage graph would be noisy. As many entities will be created by this reporting task over time, it might be troublesome to clean entities to change strategy afterward, especially Atlas manages data reported by not only NiFi.

Simple Path Maps data I/O provenance events such as SEND/RECEIVE to ’nifi_flow_path’ created by NiFi flow structure analysis.

It tracks DataSet lineage at ’nifi_flow_path’ process level, instead of event level, to report a simple data lineage graph in Atlas. If different DataSets go through the same ’nifi_flow_path’, all of those input DataSets are shown as if it is impacting every output DataSets. For example, if there are A.txt and B.txt processed by the same GetFile processor then eventually ingested to HDFS path-A and path-B respectively by PutHDFS using NiFi Expression Language to decide where to store FlowFiles. Then Atlas lineage graph will show as if both A.txt and B.txt are ingested to HDFS path-A, when you pick path-A to see which DataSets are ingested into it, because both A.txt and B.txt went through the same GetFile and PutHDFS processors.

This strategy generates the least amount of data in Atlas. It might be useful when you prefer a big picture in Atlas that can summarize how each DataSets and Processes are connected among NiFi and other software. NiFi provenance events can be used to investigate more details if needed as it stores event (FlowFile) level complete lineage.

Complete Path Focuses on DROP provenance event type. Because it represents the end of a particular FlowFile lifecycle. By traversing provenance events backward from a DROP event, the entire lineage can be reported for a given FlowFile including where it is created, then where it goes.

However, reporting complete flow path for every single FlowFile will produce too many entities in Atlas. Also, it may not be the best approach for Atlas as it is designed to manage DataSet level lineage rather than event level as of today. In order to keep the amount of data at minimum, this strategy calculates hash from Input and Output DataSets of a lineage path, so that the same complete path routes will become the same Atlas entity.

If different FlowFiles went through the exact same route, then those provenance data only create a single ’nifi_flow_path’ Atlas entity. On the other hand, a single part of NiFi flow can generate different FlowFile lineage paths, those will be reported as different ’nifi_flow_path’ entities. Typically when NiFi Expression Language is used for NiFi Processor configuration to connect DataSets.

NOTE: While Simple Path strategy can report lineage by looking at each individual NiFi provenance event record, Complete Path strategy has to query parent events. It needs more computing resource (CPU and I/O) when NiFi provenance event queries are performed.

To illustrate the difference between lineage strategies, let’s look at a sample NiFi flow as shown in the screenshots below.

With ‘Simple Path’, Atlas lineage is reported like below when ‘/tmp/input/A1.csv’ is selected. Since ‘Simple Path’ simply maps I/O events to a ’nifi_flow_path’, ‘/tmp/output/B1.csv’ is shown in the lineage graph because that file is written by the ‘GetFile, PutFile…’ process.

With ‘Complete Path’, Atlas lineage is reported like below. This time, ‘GetFile, PutFile…’ process is not linked to ‘/tmp/output/B1.csv’ because ‘Complete Path’ strategy created two different ’nifi_flow_path’ entities one for ‘/tmp/input/A1.csv -> /tmp/output/A1.csv’ and another for ‘/tmp/input/B1.csv -> /tmp/output/B1.csv’.

However, once the data records ingested from A.csv and B.csv got into a bigger DataSet, ’nifi-test’ Kafka topic in this example (or whatever DataSet such as a database table or a concatenated file … etc), record level lineage telling where it came from is no longer able to be tracked. So the resulting ‘/tmp/consumed/B_2..’ is shown in the same lineage graph, although file does not contain any data came from ‘/tmp/input/A1.csv’.

NiFi Provenance Event Analysis

To create lineage describing which NiFi component interacts with what DataSets, DataSet entity and Process entity need to be created in Atlas. Specifically, at least 3 entities are required to draw a lineage graph on Atlas UI. A Process entity, and a DataSet which is referred by a Process ‘inputs’ attribute, and a DataSet referred from ‘outputs’ attribute. For example:


            # With following entities
            guid: 1
            typeName: fs_path (extends DataSet)
            qualifiedName: /data/A1.csv@BranchOffice1

            guid: 2
            typeName: nifi_flow_path (extends Process)
            name: GetFile, PutHDFS
            qualifiedName: 529e6722-9b49-3b66-9c94-00da9863ca2d@BranchOffice1
            inputs: refer guid(1)
            outputs: refer guid(3)

            guid: 3
            typeName: hdfs_path (extends DataSet)
            qualifiedName: /data/input/A1.csv@Analytics

            # Atlas draws lineage graph
            /data/A1.csv -> GetFile, PutHDFS -> /data/input/A1.csv

To identify such Process and DataSet Atlas entities, this reporting task uses NiFi Provenance Events. At least, the reporting task needs to derive following information from a NiFi Provenance event record:

typeName (e.g. fs_path, hive_table)
qualifiedName in uniqueId@namespace (e.g. /data/A1.csv@ns1)

’namespace’ in ‘qualifiedName’ attribute is resolved by mapping IP address or hostname available at NiFi Provenance event ’transitUri’ to a namespace. See Namespaces for detail.

For ’typeName’ and ‘qualifiedName’, different analysis rules are needed for different DataSet. ReportLineageToAtlas provides an extension point called ‘NiFiProvenanceEventAnalyzer’ to implement such analysis logic for particular DataSets.

When a Provenance event is analyzed, registered NiFiProvenanceEventAnalyzer implementations are searched in following order to find a best matching analyzer implementation:

By component type (e.g. KafkaTopic)
By transit URI protocol (e.g. HDFSPath)
By event type, if none of above analyzers matches (e.g. Create)

Supported DataSets and Processors

Currently, following NiFi components are supported by this reporting task:

Analyzer	Covered NiFi components			Atlas DataSet		Description
	name	eventType	transitUri example	typeName	qualifiedName
NiFiRemotePortClient	Remote Process Group Input Port	SEND	* http://nifi1.example.com:8080/nifi-api/data-transfer/input-ports/35dbc0ab-015e-1000-144c-a8d71255027d/transactions/89335043-f105-4de7-a0ef-46f2ef0c7c51/flow-files * nifi://nifi1.example.com:8081/cb729f05-b2ee-4488-909d-c6696cc34588	nifi_input_port	remotePortID@namespace (e.g. 35dbc0ab-015e-1000-144c-a8d71255027d@ns1)
With ‘Simple Path’ strategy intermediate ’nifi_queue’ and ’nifi_flow_path’ are created as well (marked with + in the following example) `upstream (nifi_flow_path) -> + queue (nifi_queue) -> + Remote Input Port (nifi_flowPath) -> remote target port (nifi_input_port)`	* For ’nifi_flow_path’: remoteProcessGroupInputPortID@namespace(e.g. f31a6b53-3077-4c59-144c-a8d71255027d@ns1) NOTE: The remoteProcessGroupInputPortID is the client side component ID and different from the remote target port ID. Multiple Remote Process Group Input Ports can send to the same target remote input port. * For ’nifi_queue’: remoteProcessGroupInputPortID@namespace(e.g. f31a6b53-3077-4c59-144c-a8d71255027d@ns1)
Remote Process Group Output Port	RECEIVE	* http://nifi1.example.com:8080/nifi-api/data-transfer/output-ports/45dbc0ab-015e-1000-144c-a8d71255027d/transactions/99335043-f105-4de7-a0ef-46f2ef0c7c51/flow-files * nifi://nifi1.example.com:8081/db729f05-b2ee-4488-909d-c6696cc34588	nifi_output_port	remotePortID@namespace (e.g. 45dbc0ab-015e-1000-144c-a8d71255027d@ns1)
With ‘Simple Path’ strategy intermediate ’nifi_flow_path’ and ’nifi_queue’ are created as well (marked with + in the following example) `remote target port (nifi_output_port) -> + Remote Output Port (nifi_flow_path) -> + queue (nifi_queue) -> downstream (nifi_flow_path)`	* For ’nifi_flow_path’: remoteProcessGroupOutputPortID@namespace(e.g. 7375f8f6-4604-468d-144c-a8d71255027d@ns1) NOTE: The remoteProcessGroupOutputPortID is the client side component ID and different from the remote target port ID. Multiple Remote Process Group Output Ports can pull from the same target remote output port. * For ’nifi_queue’: downstreamPathGUID@namespace(e.g. bb530e58-ee14-3cac-144c-a8d71255027d@ns1)
NiFiRemotePortServer	Remote Input Port Remote Output Port	RECEIVE SEND	* http://nifi1.example.com:8080/nifi-api/data-transfer/input-ports/35dbc0ab-015e-1000-144c-a8d71255027d/transactions/89335043-f105-4de7-a0ef-46f2ef0c7c51/flow-files * nifi://nifi1.example.com:8081/cb729f05-b2ee-4488-909d-c6696cc34588	nifi_input_port nifi_output_port	remotePortID@namespace(e.g. 35dbc0ab-015e-1000-144c-a8d71255027d@ns1)
KafkaTopic	PublishKafka ConsumeKafka PublishKafka_0_10 ConsumeKafka_0_10 PublishKafkaRecord_0_10 ConsumeKafkaRecord_0_10	SEND RECEIVE SEND RECEIVE SEND RECEIVE	PLAINTEXT://kafka1.example.com:9092/sample-topic (Protocol can be either PLAINTEXT, SSL, SASL_PLAINTEXT or SASL_SSL)	kafka_topic	topicName@namespace(e.g. testTopic@ns1)	NOTE:With Atlas earlier than 0.8.2, the same topic name in different clusters can not be created using the pre-built ‘kafka_topic’. See ATLAS-2286.
PutHiveStreaming	PutHiveStreaming	SEND	thrift://hive.example.com:9083	hive_table	tableName@namespace(e.g. myTable@ns1)
Hive2JDBC	PutHiveQL SelectHiveQL	SEND RECEIVE, FETCH	jdbc:hive2://hive.example.com:10000/default	hive_table	tableName@namespace(e.g. myTable@ns1)	The corresponding Processors parse Hive QL to set ‘query.input.tables’ and ‘query.output.tables’ FlowFile attributes. These attribute values are used to create qualified name.
HDFSPath	DeleteHDFS FetchHDFS FetchParquet GetHDFS GetHDFSSequenceFile PutHDFS PutORC PutParquet	REMOTE_INVOCATION FETCH FETCH RECEIVE RECEIVE SEND SEND SEND	hdfs://nn.example.com:8020/user/nifi/5262553828219	hdfs_path	/path/fileName@namespace(e.g. /app/warehouse/hive/db/default@ns1)
AwsS3Directory	DeleteHDFS FetchHDFS FetchParquet GetHDFS GetHDFSSequenceFile PutHDFS PutORC PutParquet	REMOTE_INVOCATION FETCH FETCH RECEIVE RECEIVE SEND SEND SEND	s3a://mybucket/mydir	aws_s3_pseudo_dir	s3UrlWithoutObjectName@namespace(e.g. s3a://mybucket/mydir@ns1)
HBaseTable	FetchHBaseRow GetHBase PutHBaseCell PutHBaseJSON PutHBaseRecord ScanHBase	FETCH RECEIVE SEND SEND SEND RECEIVE	hbase://hmaster.example.com:16000/tableA/rowX	hbase_table	tableName@namespace(e.g. myTable@ns1)
FilePath	PutFile GetFile … etc	SEND RECEIVE … etc	file:///tmp/a.txt	fs_path	/path/fileName@hostname(e.g. /tmp/dir/filename.txt@host.example.com)
unknown.CreateReceive, FetchSend, RemoteInvocation	Other Processors those generates listed event types	CREATE RECEIVE FETCH SEND REMOTE_INVOCATION		nifi_data	processorGuid@namespacedb8bb12c-5cd3-3011-c971-579f460ebedf@ns1

How it runs in NiFi cluster

When this reporting task runs in a NiFi cluster, following tasks are executed only by the primary node:

Create NiFi Atlas Types in Atlas type system
Maintain NiFi flow structure and metadata in Atlas which consists of NiFi component entities such as ’nifi_flow’, ’nifi_flow_path’ and ’nifi_input(output)_port’.

While every node (including primary node) performs following:

Analyzes NiFi provenance events stored in a provenance event repository on it, to create lineage between ’nifi_flow_path’ and other DataSet (e.g. Hive tables or HDFS path).

Limitations

Requires Atlas 0.8 incubating or later: This reporting task requires Atlas REST API version 2, which is introduced at Atlas 0.8-incubating. Older versions of Atlas are not supported.
Limited DataSets and Processors support: In order to report lineage to Atlas, this reporting task must know what a given processor does with a certain DataSet. Then create an ‘Atlas Object Id’ for a DataSet which uniquely identifies an entity in Atlas. Atlas Object Id has unique properties map, and mostly ‘qualifiedName’ is set in the unique properties map to identify an entity. The format of a qualifiedName depends on each DataSet.

To create this Atlas Object ID, we have to implement Processor-specific code that analyzes configured properties. See Supported DataSets and Processors for details.

Restart NiFi is required to update some ReportingTask properties As underlying Atlas client library caches configurations when it runs the first time, some properties of this reporting task can not be updated by stopping, configure and restarting the reporting task.

NiFi process needs to be restarted in such case.

Atlas Server Configurations

Delete Handler: Atlas uses ‘SoftDeleteHandler’ by default which mark relationships deleted, but still can be seen in Atlas UI. Soft delete model is useful if you would like to capture every lineage ever defined, but if you prefer seeing current state of a NiFi flow, Hard delete would be more appropriate.

To change this behavior, set following in ‘atlas-application.properties’ on Atlas server, then restart Atlas. HardDeleteHandlerV1 physically removes lineage:

atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1

Properties

Atlas Authentication Method
Specify how to authenticate this reporting task to Atlas server.
Display Name

Atlas Authentication Method

Description

Specify how to authenticate this reporting task to Atlas server.

API Name

atlas-authentication-method

Default Value

basic

Allowable Values
- Basic
- Kerberos
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Create Atlas Configuration File
If enabled, 'atlas-application.properties' file will be created in 'Atlas Configuration Directory' automatically when this Reporting Task starts. Note that the existing configuration file will be overwritten.
Display Name

Create Atlas Configuration File

Description

If enabled, 'atlas-application.properties' file will be created in 'Atlas Configuration Directory' automatically when this Reporting Task starts. Note that the existing configuration file will be overwritten.

API Name

atlas-conf-create

Default Value

false

Allowable Values
- true
- false
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Atlas Configuration Directory
Directory path that contains 'atlas-application.properties' file. If not specified and 'Create Atlas Configuration File' is disabled, then, 'atlas-application.properties' file under root classpath is used.

Display Name

Atlas Configuration Directory

Description

Directory path that contains 'atlas-application.properties' file. If not specified and 'Create Atlas Configuration File' is disabled, then, 'atlas-application.properties' file under root classpath is used.

API Name

atlas-conf-dir

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

false
Atlas Connect Timeout
Max wait time for connection to Atlas.

Display Name

Atlas Connect Timeout

Description

Max wait time for connection to Atlas.

API Name

atlas-connect-timeout

Default Value

60 sec

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Atlas Default Metadata Namespace
Namespace for Atlas entities reported by this ReportingTask. If not specified, 'atlas.metadata.namespace' or 'atlas.cluster.name' (the former having priority) in Atlas Configuration File is used. Multiple mappings can be configured by user defined properties. See 'Additional Details...' for more.

Display Name

Atlas Default Metadata Namespace

Description

Namespace for Atlas entities reported by this ReportingTask. If not specified, 'atlas.metadata.namespace' or 'atlas.cluster.name' (the former having priority) in Atlas Configuration File is used. Multiple mappings can be configured by user defined properties. See 'Additional Details...' for more.

API Name

atlas-default-cluster-name

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

false
NiFi URL for Atlas
NiFi URL is used in Atlas to represent this NiFi cluster (or standalone instance). It is recommended to use one that can be accessible remotely instead of using 'localhost'.

Display Name

NiFi URL for Atlas

Description

NiFi URL is used in Atlas to represent this NiFi cluster (or standalone instance). It is recommended to use one that can be accessible remotely instead of using 'localhost'.

API Name

atlas-nifi-url

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true
Atlas Password
Password to communicate with Atlas.
Display Name

Atlas Password

Description

Password to communicate with Atlas.

API Name

atlas-password

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

true

Required

true

Dependencies
- Atlas Authentication Method is set to any of [basic]
Atlas Read Timeout
Max wait time for response from Atlas.

Display Name

Atlas Read Timeout

Description

Max wait time for response from Atlas.

API Name

atlas-read-timeout

Default Value

60 sec

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Atlas URLs
Comma separated URL of Atlas Servers (e.g. http://atlas-server-hostname:21000 or https://atlas-server-hostname:21443). For accessing Atlas behind Knox gateway, specify Knox gateway URL (e.g. https://knox-hostname:8443/gateway/{topology-name}/atlas). If not specified, 'atlas.rest.address' in Atlas Configuration File is used.

Display Name

Atlas URLs

Description

Comma separated URL of Atlas Servers (e.g. http://atlas-server-hostname:21000 or https://atlas-server-hostname:21443). For accessing Atlas behind Knox gateway, specify Knox gateway URL (e.g. https://knox-hostname:8443/gateway/{topology-name}/atlas). If not specified, 'atlas.rest.address' in Atlas Configuration File is used.

API Name

atlas-urls

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

false
Atlas Username
User name to communicate with Atlas.
Display Name

Atlas Username

Description

User name to communicate with Atlas.

API Name

atlas-username

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true

Dependencies
- Atlas Authentication Method is set to any of [basic]
AWS S3 Model Version
Specifies what type of AWS S3 directory entities will be created in Atlas for s3a:// transit URIs (eg. PutHDFS with S3 integration). NOTE: It is strongly recommended to keep using the same AWS S3 entity model version once this reporting task started to keep Atlas data clean. Switching versions will not delete existing Atlas entities created by the old version, nor migrate them to the new version.
Display Name

AWS S3 Model Version

Description

Specifies what type of AWS S3 directory entities will be created in Atlas for s3a:// transit URIs (eg. PutHDFS with S3 integration). NOTE: It is strongly recommended to keep using the same AWS S3 entity model version once this reporting task started to keep Atlas data clean. Switching versions will not delete existing Atlas entities created by the old version, nor migrate them to the new version.

API Name

aws-s3-model-version

Default Value

v2

Allowable Values
- v1
- v2
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Filesystem Path Entities Level
Specifies how the filesystem path entities (fs_path and hdfs_path) will be logged in Atlas: File or Directory level. In case of File level, each individual file entity will be sent to Atlas as a separate entity with the full path including the filename. Directory level only logs the path of the parent directory without the filename. This setting affects processors working with files, like GetFile or PutHDFS. NOTE: Although the default value is File level for backward compatibility reasons, it is highly recommended to set it to Directory level because File level logging can generate a huge number of entities in Atlas.
Display Name

Filesystem Path Entities Level

Description

Specifies how the filesystem path entities (fs_path and hdfs_path) will be logged in Atlas: File or Directory level. In case of File level, each individual file entity will be sent to Atlas as a separate entity with the full path including the filename. Directory level only logs the path of the parent directory without the filename. This setting affects processors working with files, like GetFile or PutHDFS. NOTE: Although the default value is File level for backward compatibility reasons, it is highly recommended to set it to Directory level because File level logging can generate a huge number of entities in Atlas.

API Name

filesystem-paths-level

Default Value

FILE

Allowable Values
- File
- Directory
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Kerberos User Service
Specifies the Kerberos Credentials to use for authentication
Display Name

Kerberos User Service

Description

Specifies the Kerberos Credentials to use for authentication

API Name

kerberos-user-service

Service Interface

org.apache.nifi.kerberos.KerberosUserService

Service Implementations

org.apache.nifi.kerberos.KerberosKeytabUserService

org.apache.nifi.kerberos.KerberosPasswordUserService

org.apache.nifi.kerberos.KerberosTicketCacheUserService

Expression Language Scope

Not Supported

Sensitive

false

Required

true

Dependencies
- Atlas Authentication Method is set to any of [kerberos]
Lineage Strategy
Specifies granularity on how NiFi data flow should be reported to Atlas. NOTE: It is strongly recommended to keep using the same strategy once this reporting task started to keep Atlas data clean. Switching strategies will not delete Atlas entities created by the old strategy. Having mixed entities created by different strategies makes Atlas lineage graph noisy. For more detailed description on each strategy and differences, refer 'NiFi Lineage Strategy' section in Additional Details.
Display Name

Lineage Strategy

Description

Specifies granularity on how NiFi data flow should be reported to Atlas. NOTE: It is strongly recommended to keep using the same strategy once this reporting task started to keep Atlas data clean. Switching strategies will not delete Atlas entities created by the old strategy. Having mixed entities created by different strategies makes Atlas lineage graph noisy. For more detailed description on each strategy and differences, refer 'NiFi Lineage Strategy' section in Additional Details.

API Name

nifi-lineage-strategy

Default Value

SimplePath

Allowable Values
- Simple Path
- Complete Path
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Provenance Record Batch Size
Specifies how many records to send in a single batch, at most.

Display Name

Provenance Record Batch Size

Description

Specifies how many records to send in a single batch, at most.

API Name

provenance-batch-size

Default Value

1000

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Provenance Record Start Position
If the Reporting Task has never been run, or if its state has been reset by a user, specifies where in the stream of Provenance Events the Reporting Task should start
Display Name

Provenance Record Start Position

Description

If the Reporting Task has never been run, or if its state has been reset by a user, specifies where in the stream of Provenance Events the Reporting Task should start

API Name

provenance-start-position

Default Value

beginning-of-stream

Allowable Values
- Beginning of Stream
- End of Stream
Expression Language Scope

Not Supported

Sensitive

false

Required

true
SSL Context Service
Specifies the SSL Context Service to use for communicating with Atlas.

Display Name

SSL Context Service

Description

Specifies the SSL Context Service to use for communicating with Atlas.

API Name

ssl-context-service

Service Interface

org.apache.nifi.ssl.SSLContextService

Service Implementations

org.apache.nifi.ssl.StandardRestrictedSSLContextService

org.apache.nifi.ssl.StandardSSLContextService

Expression Language Scope

Not Supported

Sensitive

false

Required

false

Dynamic Properties

hostnamePattern.<namespace>
White space delimited (including new line) Regular Expressions to resolve a namespace from a hostname or IP address of a transit URI of NiFi provenance record.

Name

hostnamePattern.<namespace>

Description

White space delimited (including new line) Regular Expressions to resolve a namespace from a hostname or IP address of a transit URI of NiFi provenance record.

Value

hostname Regex patterns

Expression Language Scope

ENVIRONMENT

State Management

Scopes	Description
LOCAL	Stores the Reporting Task's last event Id so that on restart the task knows where it left off.