Homepage/Cloudera DataFlow for Data Hub7.2.12

(Public Cloud)

Search Documentation

▶︎Cloudera
1. Reference Architectures
▼Cloudera Public Cloud
▶︎Cloudera Private Cloud
▶︎Cloudera Manager
1. Cloudera Manager
▶︎Applications
▶︎Legacy
▶︎

Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime

«

Filter topics

CDF for Data Hub
▶︎Release Notes
▶︎Concepts
▶︎Planning
▼How To: Flow Management
▶︎How To: Streams Messaging
▶︎How To: Streaming Analytics
▶︎Reference
▶︎Learning & Training
1. ▶︎Getting Started with Streams Messaging Clusters on CDP Public Cloud
2. ▶︎Getting Started with Apache NiFi Registry

.NET client
1. Overview
1.1. Version information
1.2. Contact information
1.3. License information
1.4. URI scheme
1.5. Tags
2. Security
2.1. Authorization
2.2. BasicAuth
3. Resources
3.1. Access
3.1.1. Get access status
3.1.10. Get identity provider usage
3.1.11. Create token using kerberos
3.1.12. Create token using basic auth
3.1.2. Performs a logout for other providers that have been issued a JWT.
3.1.3. Redirect/callback URI for processing the result of the OpenId Connect login sequence.
3.1.4. Retrieves a JWT following a successful login sequence using the configured OpenId Connect provider.
3.1.5. Performs a logout in the OpenId Provider.
3.1.6. Initiates a request to authenticate through the configured OpenId Connect provider.
3.1.7. Create token trying all providers
3.1.8. Create token using identity provider
3.1.9. Test identity provider
3.10. Items
3.10.1. Get all items
3.10.2. Get item fields
3.10.3. Get bucket items
3.11. Policies
3.11.1. Create access policy
3.11.2. Get all access policies
3.11.3. Get available resources
3.11.4. Get access policy for resource
3.11.5. Get access policy
3.11.6. Update access policy
3.11.7. Delete access policy
3.12. Tenants
3.12.1. Create user group
3.12.10. Delete user
3.12.2. Get user groups
3.12.3. Get user group
3.12.4. Update user group
3.12.5. Delete user group
3.12.6. Create user
3.12.7. Get all users
3.12.8. Get user
3.12.9. Update user
3.2. Bucket Bundles
3.2.1. Get extension bundles by bucket
3.2.2. Create extension bundle version
3.3. Bucket Flows
3.3.1. Create flow
3.3.10. Get latest bucket flow version metadata
3.3.11. Get bucket flow version
3.3.2. Get bucket flows
3.3.3. Get bucket flow
3.3.4. Update bucket flow
3.3.5. Delete bucket flow
3.3.6. Get bucket flow diff
3.3.7. Create flow version
3.3.8. Get bucket flow versions
3.3.9. Get latest bucket flow version content
3.4. Buckets
3.4.1. Create bucket
3.4.2. Get all buckets
3.4.3. Get bucket fields
3.4.4. Get bucket
3.4.5. Update bucket
3.4.6. Delete bucket
3.5. Bundles
3.5.1. Get all bundles
3.5.10. Get bundle version extension
3.5.11. Get bundle version extension docs
3.5.12. Get bundle version extension docs details
3.5.2. Get all bundle versions
3.5.3. Get bundle
3.5.4. Delete bundle
3.5.5. Get bundle versions
3.5.6. Get bundle version
3.5.7. Delete bundle version
3.5.8. Get bundle version content
3.5.9. Get bundle version extensions
3.6. Config
3.6.1. Get configration
3.7. Extension Repository
3.7.1. Get extension repo buckets
3.7.10. Get extension repo extension details
3.7.11. Get extension repo version checksum
3.7.12. Get global extension repo version checksum
3.7.2. Get extension repo groups
3.7.3. Get extension repo artifacts
3.7.4. Get extension repo versions
3.7.5. Get extension repo version
3.7.6. Get extension repo version content
3.7.7. Get extension repo extensions
3.7.8. Get extension repo extension
3.7.9. Get extension repo extension docs
3.8. Extensions
3.8.1. Get all extensions
3.8.2. Get extensions providing service API
3.8.3. Get extension tags
3.9. Flows
3.9.1. Get flow fields
3.9.2. Get flow
3.9.3. Get flow versions
3.9.4. Get latest flow version
3.9.5. Get latest flow version metadata
3.9.6. Get flow version
4. Definitions
4.1. AccessPolicy
4.10. BundleInfo
4.11. BundleVersion
4.12. BundleVersionDependency
4.13. BundleVersionMetadata
4.14. ComponentDifference
4.15. ComponentDifferenceGroup
4.16. ConnectableComponent
4.17. ControllerServiceAPI
4.18. ControllerServiceDefinition
4.19. CurrentUser
4.2. AccessPolicySummary
4.20. DeprecationNotice
4.21. DynamicProperty
4.22. DynamicRelationship
4.23. Extension
4.24. ExtensionBundle
4.25. ExtensionFilterParams
4.26. ExtensionMetadata
4.27. ExtensionMetadataContainer
4.28. ExtensionRepoArtifact
4.29. ExtensionRepoBucket
4.3. AllowableValue
4.30. ExtensionRepoGroup
4.31. ExtensionRepoVersion
4.32. ExtensionRepoVersionSummary
4.33. ExternalControllerServiceReference
4.34. Fields
4.35. JaxbLink
4.36. Permissions
4.37. Position
4.38. Property
4.39. ProvidedServiceAPI
4.4. Attribute
4.40. RegistryConfiguration
4.41. Relationship
4.42. Resource
4.43. ResourcePermissions
4.44. Restricted
4.45. Restriction
4.46. RevisionInfo
4.47. Stateful
4.48. SystemResourceConsideration
4.49. TagCount
4.5. BatchSize
4.50. Tenant
4.51. User
4.52. UserGroup
4.53. VersionedConnection
4.54. VersionedControllerService
4.55. VersionedFlow
4.56. VersionedFlowCoordinates
4.57. VersionedFlowDifference
4.58. VersionedFlowSnapshot
4.59. VersionedFlowSnapshotMetadata
4.6. Bucket
4.60. VersionedFunnel
4.61. VersionedLabel
4.62. VersionedParameter
4.63. VersionedParameterContext
4.64. VersionedPort
4.65. VersionedProcessGroup
4.66. VersionedProcessor
4.67. VersionedPropertyDescriptor
4.68. VersionedRemoteGroupPort
4.69. VersionedRemoteProcessGroup
4.7. BucketItem
4.8. BuildInfo
4.9. Bundle
@OnAdded
@OnEnabled
@OnPrimaryNodeStateChange
@OnRemoved
@OnScheduled
@OnShutdown
@OnStopped
@OnUnscheduled
AbstractProcessor API
Access Policies
Access Policies
Access Policy Configuration Examples
Accessing Parameters
Accessing the UI with Multi-Tenant Authorization
AccessPolicyProvider
Active / Active Architecture
Active / Stand-by Architecture
Add a User
Add an Empty Group
Add ControllerServices
Add Ranger policies
Add Ranger policies
Add the user or group to a pre-defined access policy
Add the user or group to a pre-defined access policy
Add the user to predefined Ranger access policies
Add User to a Group
Adding a new schema
Adding a Parameter to a Parameter Context
Adding and Configuring Record Reader and Writer Controller Services
Adding catalogs as Data Provider
Adding clusters to SRM's configuration
Adding Commands
Adding Components to the Canvas
Adding Controller Services for Dataflows
Adding Controller Services for Reporting Tasks
Adding Custom Catalogs
Adding Functionality to Apache NiFi
Adding Hive as Catalog
Adding Java to the Functions language option
Adding Kafka as Data Provider
Adding Kudu as Catalog
Adding Schema Registry as Catalog
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Additional Actions
Additional Certificate Commands
Additional Help
Additional Testing Capabilities
Advanced Documentation
Aggregation for Analytics
Alias Properties
Align Horizontally
Align Vertically
Allow Bundles in a Bucket to be Overwritten
Allow Insecure Cryptographic Modes
Analytics Framework
Analytics Properties
Analyzing data with Apache Flink
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Anatomy of a Process Group
Anatomy of a Processor
Anatomy of a Remote Process Group
Apache Flink
Apache Kafka
Apache Kafka Overview
Apache Knox
Apache NiFi
Apache NiFi Admin Guide
Apache NiFi Expression Language Guide
Apache NiFi Expression Language Overview
Apache NiFi Overview
Apache NiFi Record Path Reference
Apache NiFi RecordPath Overview
Apache NiFi Registry
Apache NiFi Registry Admin Guide
Apache NiFi Registry REST API Reference
Apache NiFi REST API Reference
Apache NiFi REST API Reference
Appendix - Schema example
Application development
Argon2
Arrays
Assign resource roles
Assign the EnvironmentUser role
Assigning a Parameter Context to a Process Group
Assigning administrator level permissions
Assigning resource roles
Assigning selective permissions to user
Atlas entities in Flink metadata collection
Authentication
Authorization
Authorization
Authorization example
Authorization workflow
Authorizer Configuration
Authorizer Configuration
Authorizers.xml Setup
Authorizers.xml Setup
Authorizing Flow Management Cluster Access in CDP Public Cloud
Autoloading Custom Processors
Available Configuration Options
Back-Referencing
Backup
Backup & Recovery
Backwards Compatibility
base64Decode
base64Encode
Basic Cluster Setup
Basics
Bcrypt
Bcrypt, Scrypt, PBKDF2, Argon2
Before you begin
Before you begin
Bending Connections
Bidirectional replication example of two active clusters
Bidirectional Replication Flows
Boolean Logic
Bootstrap Properties
Bootstrap Properties
Bootstrap.conf
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Browser Support
Browser Support
Bucket Policies
Bucket Policies
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building a DataFlow
Building Cloudera Manager charts with Kafka metrics
Building your dataflow
Building your dataflow
Bundle Coordinates
Bundle Id
Bundle Persistence
Bundle Persistence Providers
CDF for Data Hub
Change Version
Changing Component Versions
Changing Configuration and Context Menu Options
Channel encryption
Checking producer activity
Checking schema registration
Child Operator
Choosing the number of partitions for a topic
Clear Activity and Shutdown Existing NiFi
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
Client/Server
ClouderaRegistryKafkaDeserializationSchema
ClouderaRegistryKafkaSerializationSchema
Cluster Common Properties
Cluster Migration Architectures
Cluster Node Identities
Cluster Node Properties
Cluster sizing
Clustering Configuration
coalesce
Cohesion and Reusability
Command and Control of the DataFlow
Command Line Tools
Command Line Tools
Comments Tab
Commit Local Changes
Common Processor Patterns
Communication within the Cluster
Compatibility Policies
Component Alignment
Component Lifecycle
Component Linking
Component Notification
Component Support in Cloudera DataFlow for Data Hub 7.2.12
Component types and metrics for alert policies
Component Versions
ComponentLog
Compose Tab
Composite Implementations
concat
Concept of tables in SSB
Configuration Best Practices
Configuration examples
Configuration Files
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configure clients on a producer or consumer level
Configure clients on an application level
Configure data directories for clusters with custom disk configurations
Configure each object store processor
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Ranger policies for site-to-site communication
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Configure SRM for Failover and Failback
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure TLS Encryption Manually for Schema Registry
Configure your source processor
Configure your truststores
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Process Group
Configuring a Processor
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring capacity estimations and goals
Configuring Cruise Control
Configuring Flink application resources
Configuring Flink applications
Configuring Flow Management clusters to hot load custom NARs
Configuring Kafka tables
Configuring Kafka ZooKeeper chroot
Configuring Kerberos properties
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring Materialized View database information
Configuring multiple listeners
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring Ranger policies for SSB
Configuring Remote Querying [Technical Preview]
Configuring replication specific REST servers
Configuring replications
Configuring RocksDB state backend
Configuring SMM for monitoring Kafka cluster replications
Configuring SQL job settings
Configuring srm-control
Configuring State Providers
Configuring Streams Replication Manager
Configuring the advertised information of the SRM Service role [Technical Preview]
Configuring the Atlas hook in Kafka
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring the SRM client's secure storage
Configuring TLS/SSL properties
Configuring Users & Access Policies
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Confirming your dataflow success
Confirming your dataflow success
Connect
Connect NiFi to the Registry
Connect workers
Connecting Components
Connecting Kafka Clients to CDP Public Cloud Clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting to a NiFi Registry
Connecting to Kafka host
Connector support in SSB
Connectors
Consider the User
Console Page
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
Contact Us
contains
Content Repository
Content Viewers
Controller Services
Controller Services
Controller Services
Core Features of Flink
Core Properties
Create a Bucket
Create a Bucket
Create a Bucket Policy
Create a custom access policy
Create a Custom Access Policy
Create a Custom Access Policy
Create a New Group with Selected Users
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create Solr target collection
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your cluster
Create your cluster
Create your cluster
Create your streaming clusters
Creating a Kafka topic
Creating a notifier
Creating a Template
Creating an alert policy
Creating and configuring the HBaseSinkFunction
Creating Controller Services for your dataflow
Creating Flink tables using Templates
Creating IDBroker mapping
Creating Input Transforms
Creating Kafka tables
Creating Kafka tables in SSB
Creating Kafka tables using Console wizard
Creating Kafka tables using Templates
Creating Kafka topic
Creating Machine User
Creating Materialized Views
Creating Streaming Analytics cluster
Creating tables with Flink SQL in SSB
Creating TLS truststore
Creating User Defined Functions
Creating Users and Groups
Creating Webhook tables
Creating your first Flow Management cluster
Creating your First Flow Management Cluster in CDP Public Cloud
Creating your first Streaming Analytics cluster
Creating your First Streaming Analytics Cluster in CDP Public Cloud
Creating your first Streams Messaging cluster
Creating your First Streams Messaging Cluster in CDP Public Cloud
Cross Data Center Replication
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control Overview
Cruise Control REST API endpoints
Custom Processor UIs
Customizing the Kerberos principal for Schema Registry
Data Buffering
Data Egress
Data Hub cluster definitions
Data Ingress
Data model version of serialized Flow snapshots
Data Provenance
Data Providers Page
Data Types
Database Properties
DatabaseFlowPersistenceProvider
DataStream connectors
Date Manipulation
Decommission Nodes
Define your CDP Private Cloud Base dataflow
Define your CDP Public Cloud dataflow
Defining and adding clusters for replication
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining Schema Registry access policies
Delegation token based authentication
Delete a Bucket
Delete a Bucket Policy
Delete a Flow
Delete a User
Delete Multiple Buckets
Delete Multiple Users
Delete Nodes
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deployment scenarios
Deployment scenarios
Deprecating a Component
Descendant Operator
Details of an Event
Details Tab
Developing a ControllerService
Developing a Reporting Task
Developing Apache Kafka Applications
Developing JavaScript functions
Disabling an alert policy
Disconnect
Disconnect Nodes
Disk management
Disk Removal
Disk Replacement
Documenting a Component
Documenting Capability and Keywords
Documenting FlowFile Attribute Interaction
Documenting Properties
Documenting Related Components
Documenting Relationships
Download Bundle
Downloading and Installing NiFi Registry
Downloading the Snowflake JDBC driver jar file
Driver inter-node coordination
Edit a Bucket Name
Edit a User Name
Email Notification Service
Embedded ZooKeeper
Embedded ZooKeeper Server
Embedded ZooKeeper with TLS
Enable authorization in Kafka with Ranger
Enable high availability
Enable Kerberos authentication
Enable or disable authentication with delegation tokens
Enable security for Cruise Control
Enabling an alert policy
Enabling checkpoints for Flink applications
Enabling end-to-end latency monitoring
Enabling Flink DEBUG logging
Enabling interceptors
Enabling Kerberos for the SRM service
Enabling Remote Querying [Technical Preview]
Enabling savepoints for Flink applications
Enabling TLS/SSL for the SRM service
Enabling/Disabling a Component
Enabling/Disabling Controller Services
Encode/Decode Functions
Encrypt-Config Tool
Encrypt-Config Tool
Encrypted Content Repository
Encrypted File System Content Repository Properties
Encrypted FlowFile Repository
Encrypted Passwords in Configuration Files
Encrypted Passwords in Configuration Files
Encrypted Passwords in Flows
Encrypted Provenance Considerations
Encrypted Provenance Repository
Encrypted Write Ahead FlowFile Repository Properties
Encrypted Write Ahead Provenance Repository Properties
Encryption Configuration
End to end latency overview
End to end latency use case
endsWith
Enqueue FlowFiles
Enrich/Modify Content
Enriching streaming data with join
Error Handling
Escaping Expression Language
Essential metrics to monitor
Evaluating Multiple Attributes
Event Hooks
Evolve your schema
Evolving a schema
Example - Secure NiFi Registry with Proxied-Entity
Example - Secure NiFi Registry without Proxied-Entity
Example Dataflow
Example: joining Kafka and Kudu tables
Examples
Examples of Interacting with Schema Registry
Exceptions within a callback: IOException, RuntimeException
Exceptions within the Processor
Exchanging Data with External Systems
Executing SQL jobs in production mode
Expanding an Event
Expected Behavior
Expected Behavior
Experimental Warning
Exporting a Template
Exposing Processor Properties
Exposing Processor's Relationships
Expression Language Editor
Expression Language Hierarchy
Expression Language in the Application
Extension Directories
Fan-in and Fan-out Replication Flows
fieldName
File descriptor limits
File Manager
File System Content Repository Properties
FileAccessPolicyProvider
FileAuthorizer
FileBasedKeyProvider
FileBasedKeyProvider
FileSystemBundlePersistenceProvider
FileSystemFlowPersistenceProvider
Filesystems
FileUserGroupProvider
Filter Functions
Filters
Filters
Find Parents
Finding list of brokers
Finding Schema Registry endpoint
Fixed Issues in Cloudera DataFlow for Data Hub 7.2.12
Fixed Issues in Flow Management
Fixed Issues in Streaming Analytics
Fixed Issues in Streams Messaging
Flink application example
Flink application structure
Flink Dashboard
Flink DDL
Flink DML
Flink metadata collection using Atlas
Flink Project Template
Flink Queries
Flink SQL Overview
Flow Analyzer
Flow Election
Flow Management
Flow Management cluster definitions
Flow Management cluster layout
Flow Persistence
Flow Persistence Providers
FlowFile
FlowFile Repository
For Linux/Unix/Mac OS X users
format
Function Usage
Functions
Functions
Functions Tab
Gather configuration information
General Design Considerations
General Tab
Getting Started with Apache NiFi Registry
Getting Started with Streams Messaging Clusters on CDP Public Cloud
Getting started: running a simple SQL job
GitFlowPersistenceProvider
Give users access to your cluster
Give users access to your cluster
Give users access to your cluster
Governance
Governance
Grant permission for the ATLAS_HOOK topic
Grant Special Privileges to a User
Granting Machine User access to environment
Granularity of metrics for end-to-end latency
Group Window
Groups and fetching
H2
H2 Settings
Handling disk failures
Handling large messages
hash
HBase sink with Flink
High Level Overview of Key NiFi Features
Highly Available Kafka Architectures
Historical Statistics of a Component
History Tab
Hot Loading Custom NARs
How Cruise Control retrieves metrics
How Cruise Control self-healing works
How does it work?
How does it work?
How does it work?
How to contribute to Apache NiFi
How to install and start NiFi
How to install and start NiFi Registry
How to Set up Failover and Failback
HTTP Notification Service
I Started NiFi Registry. Now What?
Identity Mapping Properties
Identity Mapping Properties
Import a Versioned Flow
Import a Versioned Flow
Importing a Template
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Improve Performance in Schema Registry
Individual Port Transmission
Ingesting data into Amazon S3
Ingesting Data into Amazon S3 Buckets
Ingesting Data into Apache HBase in CDP Cloud
Ingesting Data into Apache HBase in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting data into Apache Kafka
Ingesting Data into Apache Kafka in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting data into Apache Solr
Ingesting Data into Apache Solr in CDP Public Cloud
Ingesting Data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting data into CDP Object Stores with RAZ authorization
Ingesting Data into CDP Public Cloud
Ingesting Data into Cloud Object Stores with RAZ Authorizations
Ingesting Data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Initial Admin Identity (New NiFi Instance)
Initial Admin Identity (New NiFi Registry Instance)
Install
Install the new NiFi Version
Installing as a Service
Installing Custom Processors
Instantiate TestRunner
Instantiating a Template
Integrate Kafka and Schema Registry
Integrate Kafka and Schema Registry using NiFi Processors
Integrating with Atlas
Integrating with Kafka
Integrating with NiFi
Integrating with Schema Registry
Integration with Kafka
Inter-broker security
Interacting with a ControllerService
Interactive Usage
Introducing streams messaging cluster on CDP Public Cloud
Introduction
Introduction
Introduction to alert policies in Streams Messaging Manager
Introduction to Materialized Views
Introduction to monitoring Kafka cluster replications in SMM
Introduction to SQL Stream Builder
Introduction to Streams Messaging Manager
isBlank
isEmpty
ISR management
Java client
Java Cryptography Extension (JCE) Limited Strength Jurisdiction Policies
JBOD
JBOD Disk migration
JBOD setup
Job lifecycle
Job Lifecycle
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Joining streaming and bounded tables
JVM and garbage collection
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka consumers
Kafka credentials property reference
Kafka FAQ
Kafka Introduction
Kafka Metrics Reporter
Kafka producers
Kafka public APIs
Kafka security hardening with Zookeeper ACLs
Kafka Streams
Kafka with Flink
kafka-*-perf-test
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberizing Embedded ZooKeeper Server
Kerberizing NiFi's ZooKeeper Client
Kerberos
Kerberos
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Kerberos Properties
Kerberos Properties
Kerberos Service
Kerberos Service
Key Derivation Functions
Key Features
Key features of SSB
Key Rotation
Key Rotation
Keywords
Known Issues In Cloudera DataFlow for Data Hub 7.2.12
Known Issues in Flow Management
Known Issues in Streaming Analytics
Known Issues in Streams Messaging
Kudu with Flink
LDAP authentication
LdapUserGroupProvider
Leader positions and in-sync replicas
Legacy Authorized Users (NiFi Instance Upgrade)
Lightweight Directory Access Protocol (LDAP)
Lightweight Directory Access Protocol (LDAP)
Log cleaner
Logging In
Logging In
LoggingEventHookProvider
Logs and log segments
Main Use Cases
Make a Bucket Publicly Visible
Manage Buckets
Manage Bundles
Manage Flows
Manage Groups
Manage individual delegation tokens
Manage Users & Groups
Management basics
Managing Alert Policies
Managing alert policies and notifiers in SMM
Managing Apache Kafka
Managing Cruise Control
Managing Kafka Topics
Managing Local Changes
Managing Nodes
Managing registered Data Providers
Managing session for SQL jobs
Managing teams in Streaming SQL Console
Managing Templates
Managing time in SSB
Managing topics across multiple Kafka clusters
Maps
matchesRegex
Materialized Views Page
Mathematical Operations and Numeric Manipulation
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata Database
Metadata Database
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating a Flow with Sensitive Properties
Migrating Between Source and Destination ZooKeepers
Migrating Consumer Groups Between Clusters
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Mocking External Resources
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring
Monitoring checkpoint latency for cluster replication
Monitoring End to End Latency
Monitoring end to end latency for Kafka topic
Monitoring end to end latency for Kafka topic
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka Cluster Replications using Streams Messaging Manager
Monitoring Kafka Clusters
Monitoring Kafka clusters
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring of DataFlow
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring SQL Stream jobs
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Moving Data in and Out of Snowflake
Moving data out of Snowflake
Moving Data using NiFi Site-to-Site
Multi-Tenant Authorization
MySQL
Naming Conventions
Navigating within a DataFlow
Nested Versioned Flows
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
NiFi
NiFi
NiFi
NiFi Architecture
NiFi Archives (NARs)
NiFi CLI
NiFi CLI Node Commands
NiFi Components
NiFi Developer Guide Introduction
NiFi Legacy
NiFi Legacy KDF
NiFi Registry
NiFi Registry
NiFi Registry User Interface
NiFi Toolkit Administrative Tools
NiFi User Interface
nifi-cert.pem
nifi-key.key
Node Manager
None
not
Notes
Notes
Notification Services
Notifiers
Notify
Obtain HBase connection details
Obtain Hive connection details
Offload Nodes
Offsets Subcommand
Older Existing NiFi Version
On-premise to Cloud and Kafka Version Upgrade
OpenId Connect
OpenSSL PKCS#5 v1.5 EVP_BytesToKey
OpenSSL PKCS#5 v1.5 EVP_BytesToKey
Operating system requirements
Operating your Flow Management Cluster
Operation Modes
Other Group Level Actions
Other Management Features
Other supported statements
Output
Overview
Overview
padLeft
padRight
PAM authentication
Parameter Contexts
Parameters
Parameters and Expression Language
Parameters in Versioned Flows
Partitions
PBKDF2
Penalization vs. Yielding
Per-Instance ClassLoading
Performance considerations
Performance Expectations and Characteristics of NiFi
Performant .NET producer
Performing the Work
Persistence Providers
Persistence Providers
Persistent Provenance Repository Properties
Planning for Streams Replication Manager
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Port Configuration
Postgres
Potential Issues
Potential Issues
Potential Issues
Potential issues with wildcard certificates
Pre-defined Access Policies for Schema Registry
Pre-defined Access Policies for Schema Registry
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
Predicates
Prepare your clusters
Prepare your environment
Prepare your environment
Prerequisites for Running in a Secure Environment
Preserve Custom Processors
Preserve Modified NARs
Principal name mapping
ProcessContext
Processor API
Processor Behavior Annotations
Processor Initialization
Processor Locations
Processor Validation
ProcessorInitializationContext
ProcessSession
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Properties Tab
Property/Argument Handling
PropertyDescriptor
PropertyValue
Protocol between consumer and broker
Provenance Events
Provenance Events
Provenance Repository
Providers Properties
Proxy Configuration
Proxy Configuration
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using Apache NiFi
Querying a schema
Querying data with SQL Stream Builder
Queue Interaction
Quotas
Rack awareness
Ranger
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Rebalancing with Cruise Control
Recommendations for client development
Recommended Antivirus Exclusions
Recommended Antivirus Exclusions
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
Records
Referencing Custom Properties via nifi.properties
Referencing Custom Properties via nifi.properties
Referencing Parameters
Registering Data Providers in SSB
Relationship
Release Notes
Remote Process Group Transmission
Remote Querying [Technical Preview]
Remote Topics
Remove
Remove a User from a Group
Removing a Template
replace
replaceRegex
Replaying a FlowFile
Replicate data between Data Hub clusters with cloud SRM
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication Flows Overview
Reporting Processor Activity
Reporting Tasks
Reporting Tasks
Responding to Changes in Configuration
Restore
Restricted
Restricted Components in Versioned Flows
Restricted Controller Service Created in Process Group
Restricted Controller Service Created in Root Process Group
Restricting access to Kafka metadata in Zookeeper
Retries
Retrieve and upload keytab file
Retrieving keytab file
Retrieving log directory replica assignment information
Reverse Proxy Configurations
Revert Local Changes
RocksDB FlowFile Repository
Rotate the master key/secret
Route Based on Attributes
Route Based on Content (One-to-Many)
Route Based on Content (One-to-One)
Route Streams Based on Content (One-to-Many)
Run the Processor
Running a Flink job
Running a simple Flink application
Running SQL Stream jobs
Running your Flink application
Running your Flink application
Running your Flink application
S2S
S3BundlePersistenceProvider
Salt and IV Encoding
SAML
Sampling data for a running job
Save Changes to a Versioned Flow
Scaling down a NiFi cluster
Scaling down Kafka brokers
Scaling Kafka brokers
Scaling Streams Messaging Clusters
Scaling up a NiFi cluster
Scaling up Kafka brokers
Scaling up or down a NiFi cluster
Scaling your Flow Management Cluster
Scheduling Tab
Schema Differences & Limitations
Schema Entities
Schema Registry
Schema Registry Authorization through Ranger Access Policies
Schema Registry Authorization through Ranger Access Policies
Schema Registry Component Architecture
Schema Registry Concepts
Schema Registry Overview
Schema Registry Overview
Schema Registry TLS Properties
Schema Registry Use Cases
Schema Registry with Flink
Scope
ScriptEventHookProvider
Scrypt
Search Components in DataFlow
Searching
Searching by topic name
Searching for Events
Searching Kafka cluster replications by source
Securing Apache Kafka
Securing Cruise Control
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Securing ZooKeeper with Kerberos
Securing ZooKeeper with TLS
Security Configuration
Security Configuration
Security Configuration
Security examples
Security examples
Security for Flow Management Clusters and Users in CDP Public Cloud
Security overview
Security Properties
Security Properties
Sensitive Property Key Migration
Sensitive Property Key Migration
Session Rollback
Set permissions in Ranger
Set Property Values
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set up the HortonworksSchemaRegistry Controller Service
Set up your network configuration
Set workload password
Setting parallelism and max parallelism
Setting the secure storage password as an environment variable
Setting up authorization policies
Setting user limits for Kafka
Setting workload password
Setting workload password
Settings
Settings Tab
Settings to avoid data loss
Shared Event Hook Properties
ShellUserGroupProvider
Show Local Changes
Signing with Externally-signed CA Certificates
Simple .NET consumer
Simple .NET producer
Simple Java consumer
Simple Java producer
Site to Site and Reverse Proxy Examples
Site to Site Properties
Site to Site protocol sequence
Site to Site Routing Properties for Reverse Proxies
Site-to-Site
Sizing estimation based on network and disk message throughput
Sorting & Filtering Buckets
Sorting & Filtering Flows
Sorting & Filtering Users/Groups
Sorting and Filtering Components
Source, operator and sink in DataStream API
Special Privilege Policies
Special Privileges
Split Content (One-to-Many)
SQL Examples
SQL Jobs Tab
SQL Stream Builder
SQL Stream Builder architecture
SQL Syntax Guide
SRM Command Line Tools
SRM security example
SRM Service data traffic reference
srm-control
srm-control Options Reference
Standalone
Standalone Functions
StandardManagedAuthorizer
StandardManagedAuthorizer
Start New NiFi
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start Version Control
Start Version Control on a Process Group
Start your data flow
Start your data flow
Start your data flow
Starting a Component
Starting NiFi Registry
startsWith
State Management
State Management
State Manager
StateManager
StaticKeyProvider
StaticKeyProvider
Status
Status History Repository
Stop Version Control
Stopping a Component
Stopping, restarting and editing SQL jobs
Storing and Retrieving State
Streaming Analytics
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics deployment scenarios
Streaming Analytics in Cloudera
Streaming Analytics Overview
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager integration
Streams Messaging Manager Overview
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager Overview
Streams Replication Manager Reference
Streams Replication Manager requirements
Streams Replication Manager Service
String Manipulation
Structure of a NiFi Expression
Structure of a RecordPath
Subjectless Functions
Subscribing to a topic
substring
substringAfter
substringAfterLast
substringBefore
substringBeforeLast
Summary Page
Supplying a contribution
Supported NiFi Controller Services
Supported NiFi Extensions
Supported NiFi Processors
Supported NiFi Reporting Tasks
Supported tables in SSB
Supporting API
Swap Management
Switching from other Flow Persistence Provider
System Level Broker Tuning
System Properties
System Properties
System Properties
System Requirements
System Requirements
Tables Tab
Task architecture and load-balancing
Technologies
Templates
Terminating a Component's tasks
Terminology
Terminology
Terminology
Terminology Used in This Guide
Testing
Testing and validating Flink applications
The core concepts of NiFi
The downscale operation fails with decommission failed
TLS Certificate Requirements and Recommendations
TLS Encryption
TLS Generation Toolkit
TLS Toolkit
TLS/SSL client authentication
toBytes
toDate
toLowerCase
Tool usage
Topics
Topics and Groups Subcommand
toString
toUpperCase
trim
Troubleshooting
Troubleshooting
Troubleshooting Kerberos Configuration
Tuning Apache Kafka Performance
Type Coercion
UI Extensions
Understand the NiFi Record Based Processors and Controller Services
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding co-located and external clusters
Understanding Replication Flows
Understanding SRM properties, their configuration and hierarchy
Understanding the kafka-run-class Bash Script
Understanding Version Dependencies
Unit Tests
Unlocking access to Kafka metadata in Zookeeper
Unsupported Browsers
Unsupported Browsers
Unsupported command line tools
Unsupported Features in Cloudera DataFlow for Data Hub 7.2.12
Unsupported Flow Management features
Unsupported Streaming Analytics features
Unsupported Streams Messaging features
Update Attributes Based on Content
Update the Configuration Files for Your New NiFi Installation
Updating a notifier
Updating an alert policy
Updating Flink job dependencies
Upgrade Recommendations
Upgrading NiFi
Upload Bundle
Uploading and unlocking your keytab in SSB
URL Aliasing
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Use Case Architectures
Use cases
Use cases for Streams Replication Manager in CDP Public Cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User Authentication
User Authentication
User Authorization
User Window
UserGroupProvider
Using An Existing Intermediate Certificate Authority (CA)
Using Apache NiFi
Using Apache NiFi Registry
Using Custom Properties with Expression Language
Using Dynamic Materialized View Endpoints
Using Flink CLI
Using Materialized Views in SQL Stream Builder
Using Record-Enabled Processors
Using Schema Registry
Using SQL Stream Builder with Cloudera Data Visualization
Using SRM in CDP Public Cloud overview
Using Streams Replication Manager
Using the Apache NiFi Toolkit
Using the Streaming SQL Console
uuid5
Validate Output
Validating Processor Properties
ValidationContext
Validator
Variables
Variables in Versioned Flows
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying the setup
Version States
Versioning a DataFlow
View a Flow
Viewing data lineage in Apache Atlas
Viewing FlowFile Lineage
Viewing Kafka cluster replication details
Viewing Policies on Users
Viewing the UI in Variably Sized Browsers
Viewing the UI in Variably Sized Browsers
Virtual memory handling
Volatile Content Repository Properties
Volatile FlowFile Repository
Volatile Provenance Repository Properties
Web Properties
Web Properties
What is Apache Flink?
What is Apache NiFi?
What is it?
What is it?
What is it?
What to do next
What's New in Cloudera DataFlow for Data Hub 7.2.12
What's New in Flow Management
What's New in Streaming Analytics
What's New in Streams Messaging
When Processors are Triggered
Where To Go For More Information
Where to Start?
Who is This Guide For?
Why Cluster?
Wildcard Certificates
Write Ahead FlowFile Repository
Write Ahead Provenance Repository
Write Ahead Provenance Repository Properties
Writing and Reading Content Claims
Writing and Reading Event Records
Writing and Reading FlowFiles
Zero-Leader Clustering
ZooKeeper Access Control
ZooKeeper Migration Steps
ZooKeeper Migrator
ZooKeeper Migrator
ZooKeeper Properties
zookeeper-security-migration

«

Filter topics

Ingesting Data into Apache Hive in CDP Public Cloud

▼Ingesting Data into Apache Hive in CDP Public Cloud

»

Ingesting Data into Apache Hive in CDP Public Cloud

Ingesting Data into Apache Hive in CDP Public Cloud

Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your CDP user to utilize the central authentication features CDP provides and to exchange credentials for AWS or Azure access tokens, you have to map this CDP user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the Management Console in your CDP environment.
Create the Hive target table
Before you can ingest data into Apache Hive in CDP Public Cloud, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring ConsumeKafkaRecord_2_0. If you are moving data from a location other than Kafka, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using PutHive3Streaming. If you are moving data into another location, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.
Next steps
Provides information on what to do once you have moved data into Hive in CDP Public Cloud.

© 2019, 2021 by Cloudera, Inc. All rights reserved.

We want your opinion

How can we improve this page?

What kind of feedback do you have?

This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or

7.3.1
7.2
7.1.0
7.0.2