Homepage
/
Cloudera DataFlow for Data Hub
7.2.2
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
«
Filter topics
CDF for Data Hub
▶︎
Release Notes
What's New in Cloudera Data Flow for Data Hub
Component support
▶︎
Unsupported features
Unsupported Apache NiFi extensions
Unsupported Streams Messaging features
Unsupported Flow Management features
▶︎
Apache patch information
NiFi patches
NiFi Registry patches
Known issues and limitations
Fixed issues
Common vulnerabilities and exposures
▶︎
Concepts
▶︎
Apache NiFi Overview
What is Apache NiFi?
The core concepts of NiFi
NiFi Architecture
Performance Expectations and Characteristics of NiFi
High Level Overview of Key NiFi Features
▶︎
Streams Messaging
▶︎
Apache Kafka Overview
Kafka Introduction
▶︎
Kafka Architecture
Brokers
Topics
Records
Partitions
Record order and assignment
Logs and log segments
Kafka brokers and Zookeeper
Leader positions and in-sync replicas
▶︎
Kafka FAQ
Basics
Use cases
▶︎
Schema Registry Overview
▶︎
Schema Registry Overview
Examples of Interacting with Schema Registry
▶︎
Schema Registry Use Cases
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Schema Registry Component Architecture
▶︎
Schema Registry Concepts
Schema Entities
Compatibility Policies
▶︎
Streams Messaging Manager Overview
Streams Messaging Manager Overview
▶︎
Apache Flink Overview
What is Apache Flink?
Streaming use cases with Flink
▶︎
Flink Streaming Applications
Handling state in Flink
Event-driven applications with Flink
Sophisticated windowing in Flink
Using watermark in Flink
Creating checkpoints and savepoints in Flink
▶︎
Planning
▶︎
Planning your Flow Management deployment
Deployment scenarios
Flow Management cluster definitions
Flow Management cluster layout
▶︎
Planning your Streams Messaging deployment
Deployment scenarios
Data Hub cluster definitions
Streams Messaging cluster layout
▶︎
Planning your Streaming Analytics deployment
Streaming Analytics deployment scenarios
Streaming Analytics Data Hub cluster definitons
Streaming Analytics cluster layout
▼
How To: Flow Management
▶︎
Creating your First Flow Management Cluster in CDP Public Cloud
▶︎
Creating your first Flow Management cluster
Meet the prerequisites
Create your cluster
Give users access to your cluster
Next steps
▶︎
Authorizing Flow Management Cluster Access in CDP Public Cloud
Security for Flow Management Clusters and Users in CDP Public Cloud
▶︎
User Authorization
Authorization workflow
Assigning administrator level permissions
▶︎
Assigning selective permissions to user
Assign the EnvironmentUser role
Add the user to predefined Ranger access policies
Create a custom access policy
Authorization example
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
▶︎
Moving Data using NiFi Site-to-Site
▶︎
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Understand the use case
Prepare your clusters
Set up your network configuration
Configure your truststores
Define your CDP Public Cloud dataflow
Configure Ranger policies for site-to-site communication
Define your CDP Private Cloud Base dataflow
▼
Ingesting Data into CDP Public Cloud
▶︎
Ingesting Data into Apache Kafka in CDP Public Cloud
▶︎
Ingesting data into Apache Kafka
Understand the use case
Meet the prerequisites
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
Appendix - Schema example
▼
Ingesting Data into Apache Hive in CDP Public Cloud
▼
Ingesting Data into Apache Hive in CDP Public Cloud
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Apache HBase in CDP Public Cloud
▶︎
Ingesting Data into Apache HBase in CDP Cloud
Understand the use case
Meet the prerequisites
Create the HBase target table
Add Ranger policies
Obtain HBase connection details
Build the data flow
Configure the HBase client service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Apache Kudu in CDP Public Cloud
▶︎
Ingesting Data into Apache Kudu in CDP Public Cloud
Understand the use case
Meet the prerequisites
Create the Kudu target table
Build the data flow
Configure the Controller Service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify that you can write data to Kudu
Next steps
▶︎
Ingesting Data into Amazon S3 Buckets
▶︎
Ingesting data into Amazon S3
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting Data into Azure Data Lake Storage
▶︎
Ingesting data into Azure Data Lake Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Apache NiFi
▶︎
Using the Apache NiFi Interface
Terminology
NiFi User Interface
Accessing the UI with Multi-Tenant Authorization
Logging In
▶︎
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
▶︎
Building an Apache NiFi DataFlow
▶︎
Building a DataFlow
Adding Components to the Canvas
▶︎
Component Versions
Sorting and Filtering Components
Changing Component Versions
Understanding Version Dependencies
▶︎
Configuring a Processor
Settings Tab
Scheduling Tab
Properties Tab
Comments Tab
Additional Help
▶︎
Parameters
Parameter Contexts
Adding a Parameter to a Parameter Context
Assigning a Parameter Context to a Process Group
Referencing Parameters
Accessing Parameters
▶︎
Using Custom Properties with Expression Language
Variables
Referencing Custom Properties via nifi.properties
▶︎
Controller Services
Adding Controller Services for Reporting Tasks
Adding Controller Services for Dataflows
Enabling/Disabling Controller Services
Reporting Tasks
▶︎
Connecting Components
Details Tab
Settings
Changing Configuration and Context Menu Options
Bending Connections
Processor Validation
▶︎
Site-to-Site
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Example Dataflow
▶︎
Managing an Apache NiFi DataFlow
▶︎
Command and Control of the DataFlow
Starting a Component
Stopping a Component
Enabling/Disabling a Component
▶︎
Remote Process Group Transmission
Individual Port Transmission
▶︎
Encrypted Content Repository
What is it?
▶︎
How does it work?
StaticKeyProvider
FileBasedKeyProvider
Key Rotation
Writing and Reading Content Claims
Potential Issues
▶︎
Encrypted FlowFile Repository
What is it?
▶︎
How does it work?
StaticKeyProvider
FileBasedKeyProvider
Key Rotation
Writing and Reading FlowFiles
Potential Issues
Experimental Warning
Other Management Features
▶︎
Navigating an Apache NiFi DataFlow
▶︎
Navigating within a DataFlow
Component Linking
▶︎
Component Alignment
Align Vertically
Align Horizontally
▶︎
Monitoring an Apache NiFi DataFlow
▶︎
Monitoring of DataFlow
Anatomy of a Processor
Anatomy of a Process Group
Anatomy of a Remote Process Group
Queue Interaction
Summary Page
Historical Statistics of a Component
▶︎
Versioning an Apache NiFi DataFlow
▶︎
Versioning a DataFlow
Connecting to a NiFi Registry
Version States
Import a Versioned Flow
Start Version Control
▶︎
Managing Local Changes
Show Local Changes
Revert Local Changes
Commit Local Changes
Change Version
Stop Version Control
Nested Versioned Flows
Parameters in Versioned Flows
Variables in Versioned Flows
▶︎
Restricted Components in Versioned Flows
Restricted Controller Service Created in Root Process Group
Restricted Controller Service Created in Process Group
▶︎
Using Apache NiFi Templates
▶︎
Versioning a DataFlow
Connecting to a NiFi Registry
Version States
Import a Versioned Flow
Start Version Control
▶︎
Managing Local Changes
Show Local Changes
Revert Local Changes
Commit Local Changes
Change Version
Stop Version Control
Nested Versioned Flows
Parameters in Versioned Flows
Variables in Versioned Flows
▶︎
Restricted Components in Versioned Flows
Restricted Controller Service Created in Root Process Group
Restricted Controller Service Created in Process Group
▶︎
Using Apache NiFi Provenance Tools
▶︎
Data Provenance
Provenance Events
Searching for Events
Details of an Event
Replaying a FlowFile
▶︎
Viewing FlowFile Lineage
Find Parents
Expanding an Event
▶︎
Write Ahead Provenance Repository
Backwards Compatibility
Older Existing NiFi Version
Bootstrap.conf
System Properties
Encrypted Provenance Considerations
▶︎
Encrypted Provenance Repository
What is it?
How does it work?
Writing and Reading Event Records
Potential Issues
▶︎
Adding Functionality to Apache NiFi
Introduction
NiFi Components
▶︎
Processor API
▶︎
Supporting API
FlowFile
ProcessSession
ProcessContext
PropertyDescriptor
Validator
ValidationContext
PropertyValue
Relationship
StateManager
ProcessorInitializationContext
ComponentLog
▶︎
AbstractProcessor API
Processor Initialization
Exposing Processor's Relationships
Exposing Processor Properties
Validating Processor Properties
Responding to Changes in Configuration
Performing the Work
When Processors are Triggered
▶︎
Component Lifecycle
@OnAdded
@OnEnabled
@OnRemoved
@OnScheduled
@OnUnscheduled
@OnStopped
@OnShutdown
▶︎
Component Notification
@OnPrimaryNodeStateChange
Restricted
▶︎
State Manager
Scope
Storing and Retrieving State
Unit Tests
Reporting Processor Activity
▶︎
Documenting a Component
Documenting Properties
Documenting Relationships
Documenting Capability and Keywords
Documenting FlowFile Attribute Interaction
Documenting Related Components
Advanced Documentation
Provenance Events
▶︎
Common Processor Patterns
Data Ingress
Data Egress
Route Based on Content (One-to-One)
Route Based on Content (One-to-Many)
Route Streams Based on Content (One-to-Many)
Route Based on Attributes
Split Content (One-to-Many)
Update Attributes Based on Content
Enrich/Modify Content
▶︎
Error Handling
Exceptions within the Processor
Exceptions within a callback: IOException, RuntimeException
Penalization vs. Yielding
Session Rollback
▶︎
General Design Considerations
Consider the User
Cohesion and Reusability
Naming Conventions
Processor Behavior Annotations
Data Buffering
▶︎
Controller Services
Developing a ControllerService
Interacting with a ControllerService
▶︎
Reporting Tasks
Developing a Reporting Task
▶︎
UI Extensions
Custom Processor UIs
Content Viewers
Command Line Tools
▶︎
Testing
Instantiate TestRunner
Add ControllerServices
Set Property Values
Enqueue FlowFiles
Run the Processor
Validate Output
Mocking External Resources
Additional Testing Capabilities
NiFi Archives (NARs)
Per-Instance ClassLoading
Deprecating a Component
▶︎
How to contribute to Apache NiFi
Technologies
Where to Start?
Supplying a contribution
Contact Us
▶︎
Using Apache NiFi Registry
Introduction
▶︎
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
Terminology
NiFi Registry User Interface
Logging In
▶︎
Manage Flows
▶︎
View a Flow
Sorting & Filtering Flows
Delete a Flow
▶︎
Manage Buckets
Sorting & Filtering Buckets
Create a Bucket
Delete a Bucket
Delete Multiple Buckets
Edit a Bucket Name
▶︎
Bucket Policies
Create a Bucket Policy
Delete a Bucket Policy
▶︎
Manage Users & Groups
Sorting & Filtering Users/Groups
Add a User
Delete a User
Delete Multiple Users
Edit a User Name
▶︎
Special Privileges
Grant Special Privileges to a User
▶︎
Manage Groups
Add an Empty Group
Add User to a Group
Create a New Group with Selected Users
▶︎
Remove a User from a Group
User Window
Group Window
Other Group Level Actions
▶︎
Manage Bundles
Upload Bundle
▶︎
Download Bundle
Bundle Coordinates
Bundle Id
Additional Actions
▶︎
How To: Streams Messaging
▶︎
Creating your First Streams Messaging Cluster in CDP Public Cloud
▶︎
Creating your first Streams Messaging cluster
Meet the prerequisites
Create your cluster
Give users access to your cluster
Next steps
▶︎
Connecting Kafka Clients to CDP Public Cloud Clusters
Connecting Kafka clients to Data Hub provisioned clusters
▶︎
Apache Kafka
▶︎
Configuring Apache Kafka
Operating system requirements
Performance considerations
Quotas
▶︎
JBOD
JBOD setup
JBOD Disk migration
Setting user limits for Kafka
Connecting Kafka clients to Data Hub provisioned clusters
Configuring Kafka ZooKeeper chroot
▶︎
Securing Apache Kafka
▶︎
TLS
Step 1: Generate keys and certificates for Kafka brokers
Step 2: Create your own certificate authority
Step 3: Sign the certificate
Step 4: Configure Kafka brokers
Step 5: Configure Kafka clients
Configure Zookeeper TLS/SSL support for Kafka
▶︎
Authentication
Kerberos authentication
▶︎
Delegation token based authentication
Enable or disable authentication with delegation tokens
Manage individual delegation tokens
Rotate the master key/secret
▶︎
Client authentication using delegation tokens
Configure clients on a producer or consumer level
Configure clients on an application level
▶︎
Kafka security hardening with Zookeeper ACLs
Restrict access to Kafka metadata in Zookeeper
Unlock Kafka metadata in Zookeeper
▶︎
LDAP authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
PAM Authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
Authorization
▶︎
Ranger
Enable authorization in Kafka with Ranger
Configure the resource-based Ranger service used for authorization
Using Kafka's inter-broker security
▶︎
Tuning Apache Kafka Performance
Handling large messages
▶︎
Cluster sizing
Sizing estimation based on network and disk message throughput
Choosing the number of partitions for a topic
▶︎
Broker Tuning
JVM and garbage collection
Network and I/O threads
ISR management
Log cleaner
▶︎
System Level Broker Tuning
File descriptor limits
Filesystems
Virtual memory handling
Networking parameters
Configure JMX ephemeral ports
Kafka-ZooKeeper performance tuning
▶︎
Managing Apache Kafka
▶︎
Management basics
Broker log management
Record management
Broker garbage log collection and log rotation
Client and broker compatibility across Kafka versions
▶︎
Managing topics across multiple Kafka clusters
Set up MirrorMaker in Cloudera Manager
Settings to avoid data loss
▶︎
Broker migration
Migrate brokers by modifying broker IDs in meta.properties
Use rsync to copy files from one broker to another
▶︎
Disk management
Monitoring
▶︎
Handling disk failures
Disk Replacement
Disk Removal
Reassigning replicas between log directories
Retrieving log directory replica assignment information
▶︎
Metrics
Building Cloudera Manager charts with Kafka metrics
Essential metrics to monitor
▶︎
Command Line Tools
Unsupported command line tools
kafka-topics
kafka-configs
kafka-console-producer
kafka-console-consumer
kafka-consumer-groups
▶︎
kafka-reassign-partitions
Tool usage
Reassignment examples
kafka-log-dirs
zookeeper-security-migration
kafka-delegation-tokens
kafka-*-perf-test
Configuring log levels for command line tools
Understanding the kafka-run-class Bash Script
▶︎
Developing Apache Kafka Applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Streams Messaging Manager
▶︎
Monitoring Kafka Clusters
Monitoring Clusters
Monitoring Producers
Monitoring Topics
Monitoring Brokers
Monitoring Consumers
▶︎
Managing Alert Policies
Alert Policies Overview
Component Types and Metrics for Alert Policies
Notifiers
▶︎
Managing Alert Policies and Notifiers
Creating a Notifier
Updating a Notifier
Deleting a Notifier
Creating an Alert Policy
Updating an Alert Policy
Enabling an Alert Policy
Disabling an Alert Policy
Deleting an Alert Policy
▶︎
Managing Topics
Creating a Kafka Topic
Modify a Kafka Topic
Deleting a Kafka Topic
▶︎
Monitoring End to End Latency
End to End Latency Overview
Granularity of Metrics
Enabling Interceptors
Monitoring End-to-end Latency
End to End Latency Use Cases
▶︎
Schema Registry
▶︎
Integrating with Schema Registry
▶︎
Integrating with NiFi
Understanding NiFi Record Based Processing
Setting up the HortonworksSchemaRegistry Controller Service
Adding and Configuring Record Reader and Writer Controller Services
Using Record-Enabled Processors
▶︎
Integrating with Kafka
Integrating Kafka and Schema Registry Using NiFi Processors
Integrating Kafka and Schema Registry
Stateless Mode and High Availability
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
▶︎
Securing Schema Registry
Schema Registry Authorization through Ranger Access Policies
Pre-defined Access Policies for Schema Registry
Add the user or group to a pre-defined access policy
Create a Custom Access Policy
▶︎
How To: Streaming Analytics
▶︎
Creating your First Streaming Analytics Cluster in CDP Public Cloud
▶︎
Creating your first Streaming Analytics cluster
Meet the prerequisites
Create your cluster
Give users access to your cluster
Next steps
▶︎
Analyzing data with Apache Kafka in CDP Public Cloud
Understand the use case
▶︎
Prepare your environment
Assign resource roles
Create IDBroker mapping
Set workload password
Create your streaming clusters
Set Ranger policies
Retrieve keytab file
Create Atlas entity type definitions
▶︎
Analyzing your data with Kafka
Job monitoring with Flink Dashboard
Metadata governance with Atlas
Data querying with SQL Client
▶︎
Analyzing data with Apache HBase in CDP Public Cloud
▶︎
Analyzing your data with HBase
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing data with Apache Kudu in CDP Public Cloud
▶︎
Analyzing your data with Kudu
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Reference
SMM REST API Reference
▶︎
Apache NiFi REST API Reference
Apache NiFi REST API Reference
▶︎
Apache NiFi Registry REST API Reference
Apache NiFi Registry REST API Reference
▶︎
Learning & Training
▶︎
Getting Started with Apache NiFi
Who is This Guide For?
Terminology Used in This Guide
Downloading and Installing NiFi
▶︎
Starting NiFi
For Windows Users
For Linux/Mac OS X users
Installing as a Service
▶︎
I Started NiFi. Now What?
Adding a Processor
Configuring a Processor
Connecting Processors
Starting and Stopping Processors
Getting More Info for a Processor
Other Components
▶︎
What Processors are Available
Data Transformation
Routing and Mediation
Database Access
Attribute Extraction
System Interaction
Data Ingestion
Data Egress / Sending Data
Splitting and Aggregation
HTTP
Amazon Web Services
▶︎
Working With Attributes
Common Attributes
Extracting Attributes
Adding User-Defined Attributes
Routing on Attributes
Expression Language / Using Attributes in Property Values
Working With Templates
▶︎
Monitoring NiFi
Status Bar
Component Statistics
Bulletins
▶︎
Data Provenance
Event Details
Lineage Graph
▶︎
Getting Started with Apache NiFi Registry
Terminology Used in This Guide
Downloading and Installing NiFi Registry
▶︎
Starting NiFi Registry
For Linux/Unix/Mac OS X users
Installing as a Service
▶︎
I Started NiFi Registry. Now What?
Create a Bucket
Connect NiFi to the Registry
Start Version Control on a Process Group
Save Changes to a Versioned Flow
Import a Versioned Flow
.NET client
@OnAdded
@OnEnabled
@OnPrimaryNodeStateChange
@OnRemoved
@OnScheduled
@OnShutdown
@OnStopped
@OnUnscheduled
AbstractProcessor API
Accessing Parameters
Accessing the UI with Multi-Tenant Authorization
Add a User
Add an Empty Group
Add ControllerServices
Add Ranger policies
Add Ranger policies
Add the user or group to a pre-defined access policy
Add the user to predefined Ranger access policies
Add User to a Group
Adding a new schema
Adding a Parameter to a Parameter Context
Adding a Processor
Adding and Configuring Record Reader and Writer Controller Services
Adding Components to the Canvas
Adding Controller Services for Dataflows
Adding Controller Services for Reporting Tasks
Adding Functionality to Apache NiFi
Adding User-Defined Attributes
Additional Actions
Additional Help
Additional Testing Capabilities
Advanced Documentation
Alert Policies Overview
Align Horizontally
Align Vertically
Amazon Web Services
Analyzing data with Apache HBase in CDP Public Cloud
Analyzing data with Apache Kafka in CDP Public Cloud
Analyzing data with Apache Kudu in CDP Public Cloud
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Anatomy of a Process Group
Anatomy of a Processor
Anatomy of a Remote Process Group
Apache Flink Overview
Apache Kafka
Apache Kafka Overview
Apache NiFi
Apache NiFi Overview
Apache NiFi Registry REST API Reference
Apache NiFi Registry REST API Reference
Apache NiFi REST API Reference
Apache NiFi REST API Reference
Apache patch information
Appendix - Schema example
Assign resource roles
Assign the EnvironmentUser role
Assigning a Parameter Context to a Process Group
Assigning administrator level permissions
Assigning selective permissions to user
Attribute Extraction
Authentication
Authorization
Authorization example
Authorization workflow
Authorizing Flow Management Cluster Access in CDP Public Cloud
Backwards Compatibility
Basics
Bending Connections
Bootstrap.conf
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Browser Support
Browser Support
Bucket Policies
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building a DataFlow
Building an Apache NiFi DataFlow
Building Cloudera Manager charts with Kafka metrics
Bulletins
Bundle Coordinates
Bundle Id
CDF for Data Hub
Change Version
Change Version
Changing Component Versions
Changing Configuration and Context Menu Options
Choosing the number of partitions for a topic
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
Cluster sizing
Cohesion and Reusability
Command and Control of the DataFlow
Command Line Tools
Command Line Tools
Comments Tab
Commit Local Changes
Commit Local Changes
Common Attributes
Common Processor Patterns
Common vulnerabilities and exposures
Compatibility Policies
Component Alignment
Component Lifecycle
Component Linking
Component Notification
Component Statistics
Component support
Component Types and Metrics for Alert Policies
Component Versions
ComponentLog
Configure clients on a producer or consumer level
Configure clients on an application level
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Ranger policies for site-to-site communication
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your truststores
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Processor
Configuring a Processor
Configuring Apache Kafka
Configuring Kafka ZooKeeper chroot
Configuring log levels for command line tools
Connect NiFi to the Registry
Connecting Components
Connecting Kafka Clients to CDP Public Cloud Clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Processors
Connecting to a NiFi Registry
Connecting to a NiFi Registry
Consider the User
Contact Us
Content Viewers
Controller Services
Controller Services
Create a Bucket
Create a Bucket
Create a Bucket Policy
Create a custom access policy
Create a Custom Access Policy
Create a New Group with Selected Users
Create Atlas entity type definitions
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create your cluster
Create your cluster
Create your cluster
Create your streaming clusters
Creating a Kafka Topic
Creating a Notifier
Creating an Alert Policy
Creating checkpoints and savepoints in Flink
Creating your first Flow Management cluster
Creating your First Flow Management Cluster in CDP Public Cloud
Creating your first Streaming Analytics cluster
Creating your First Streaming Analytics Cluster in CDP Public Cloud
Creating your first Streams Messaging cluster
Creating your First Streams Messaging Cluster in CDP Public Cloud
Custom Processor UIs
Data Buffering
Data Egress
Data Egress / Sending Data
Data Hub cluster definitions
Data Ingestion
Data Ingress
Data Provenance
Data Provenance
Data querying with SQL Client
Data Transformation
Database Access
Define your CDP Private Cloud Base dataflow
Define your CDP Public Cloud dataflow
Delegation token based authentication
Delete a Bucket
Delete a Bucket Policy
Delete a Flow
Delete a User
Delete Multiple Buckets
Delete Multiple Users
Deleting a Kafka Topic
Deleting a Notifier
Deleting a schema
Deleting an Alert Policy
Deployment scenarios
Deployment scenarios
Deprecating a Component
Details of an Event
Details Tab
Developing a ControllerService
Developing a Reporting Task
Developing Apache Kafka Applications
Disabling an Alert Policy
Disk management
Disk Removal
Disk Replacement
Documenting a Component
Documenting Capability and Keywords
Documenting FlowFile Attribute Interaction
Documenting Properties
Documenting Related Components
Documenting Relationships
Download Bundle
Downloading and Installing NiFi
Downloading and Installing NiFi Registry
Edit a Bucket Name
Edit a User Name
Enable authorization in Kafka with Ranger
Enable or disable authentication with delegation tokens
Enabling an Alert Policy
Enabling Interceptors
Enabling/Disabling a Component
Enabling/Disabling Controller Services
Encrypted Content Repository
Encrypted FlowFile Repository
Encrypted Provenance Considerations
Encrypted Provenance Repository
End to End Latency Overview
End to End Latency Use Cases
Enqueue FlowFiles
Enrich/Modify Content
Error Handling
Essential metrics to monitor
Event Details
Event-driven applications with Flink
Evolving a schema
Example Dataflow
Examples of Interacting with Schema Registry
Exceptions within a callback: IOException, RuntimeException
Exceptions within the Processor
Expanding an Event
Experimental Warning
Exposing Processor Properties
Exposing Processor's Relationships
Expression Language / Using Attributes in Property Values
Extracting Attributes
File descriptor limits
FileBasedKeyProvider
FileBasedKeyProvider
Filesystems
Find Parents
Fixed issues
Flink Streaming Applications
Flow Management cluster definitions
Flow Management cluster layout
FlowFile
For Linux/Mac OS X users
For Linux/Unix/Mac OS X users
For Windows Users
General Design Considerations
Getting More Info for a Processor
Getting Started with Apache NiFi
Getting Started with Apache NiFi Registry
Give users access to your cluster
Give users access to your cluster
Give users access to your cluster
Grant Special Privileges to a User
Granularity of Metrics
Group Window
Groups and fetching
Handling disk failures
Handling large messages
Handling state in Flink
High Level Overview of Key NiFi Features
Historical Statistics of a Component
How does it work?
How does it work?
How does it work?
How to contribute to Apache NiFi
HTTP
I Started NiFi Registry. Now What?
I Started NiFi. Now What?
Import a Versioned Flow
Import a Versioned Flow
Import a Versioned Flow
Individual Port Transmission
Ingesting data into Amazon S3
Ingesting Data into Amazon S3 Buckets
Ingesting Data into Apache HBase in CDP Cloud
Ingesting Data into Apache HBase in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting data into Apache Kafka
Ingesting Data into Apache Kafka in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting Data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting Data into CDP Public Cloud
Installing as a Service
Installing as a Service
Instantiate TestRunner
Integrating Kafka and Schema Registry
Integrating Kafka and Schema Registry Using NiFi Processors
Integrating with Kafka
Integrating with NiFi
Integrating with Schema Registry
Interacting with a ControllerService
Introduction
Introduction
ISR management
Java client
JBOD
JBOD Disk migration
JBOD setup
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
JVM and garbage collection
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka consumers
Kafka FAQ
Kafka Introduction
Kafka producers
Kafka public APIs
Kafka security hardening with Zookeeper ACLs
Kafka Streams
kafka-*-perf-test
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberos authentication
Key Rotation
Key Rotation
Known issues and limitations
LDAP authentication
Leader positions and in-sync replicas
Lineage Graph
Log cleaner
Logging In
Logging In
Logs and log segments
Manage Buckets
Manage Bundles
Manage Flows
Manage Groups
Manage individual delegation tokens
Manage Users & Groups
Management basics
Managing Alert Policies
Managing Alert Policies and Notifiers
Managing an Apache NiFi DataFlow
Managing Apache Kafka
Managing Local Changes
Managing Local Changes
Managing Topics
Managing topics across multiple Kafka clusters
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Mocking External Resources
Modify a Kafka Topic
Monitoring
Monitoring an Apache NiFi DataFlow
Monitoring Brokers
Monitoring Clusters
Monitoring Consumers
Monitoring End to End Latency
Monitoring End-to-end Latency
Monitoring Kafka Clusters
Monitoring NiFi
Monitoring of DataFlow
Monitoring Producers
Monitoring Topics
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Moving Data using NiFi Site-to-Site
Naming Conventions
Navigating an Apache NiFi DataFlow
Navigating within a DataFlow
Nested Versioned Flows
Nested Versioned Flows
Network and I/O threads
Networking parameters
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
NiFi Architecture
NiFi Archives (NARs)
NiFi Components
NiFi patches
NiFi Registry patches
NiFi Registry User Interface
NiFi User Interface
Notifiers
Obtain HBase connection details
Obtain Hive connection details
Older Existing NiFi Version
Operating system requirements
Other Components
Other Group Level Actions
Other Management Features
PAM Authentication
Parameter Contexts
Parameters
Parameters in Versioned Flows
Parameters in Versioned Flows
Partitions
Penalization vs. Yielding
Per-Instance ClassLoading
Performance considerations
Performance Expectations and Characteristics of NiFi
Performant .NET producer
Performing the Work
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Potential Issues
Potential Issues
Potential Issues
Pre-defined Access Policies for Schema Registry
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
Prepare your clusters
Prepare your environment
ProcessContext
Processor API
Processor Behavior Annotations
Processor Initialization
Processor Validation
ProcessorInitializationContext
ProcessSession
Properties Tab
PropertyDescriptor
PropertyValue
Protocol between consumer and broker
Provenance Events
Provenance Events
Querying a schema
Queue Interaction
Quotas
Ranger
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Recommendations for client development
Record management
Record order and assignment
Records
Referencing Custom Properties via nifi.properties
Referencing Parameters
Relationship
Release Notes
Remote Process Group Transmission
Remove a User from a Group
Replaying a FlowFile
Reporting Processor Activity
Reporting Tasks
Reporting Tasks
Responding to Changes in Configuration
Restrict access to Kafka metadata in Zookeeper
Restricted
Restricted Components in Versioned Flows
Restricted Components in Versioned Flows
Restricted Controller Service Created in Process Group
Restricted Controller Service Created in Process Group
Restricted Controller Service Created in Root Process Group
Restricted Controller Service Created in Root Process Group
Retries
Retrieve keytab file
Retrieving log directory replica assignment information
Revert Local Changes
Revert Local Changes
Rotate the master key/secret
Route Based on Attributes
Route Based on Content (One-to-Many)
Route Based on Content (One-to-One)
Route Streams Based on Content (One-to-Many)
Routing and Mediation
Routing on Attributes
Run the Processor
Save Changes to a Versioned Flow
Scheduling Tab
Schema Entities
Schema Registry
Schema Registry Authorization through Ranger Access Policies
Schema Registry Component Architecture
Schema Registry Concepts
Schema Registry Overview
Schema Registry Overview
Schema Registry Use Cases
Scope
Searching for Events
Securing Apache Kafka
Securing Schema Registry
Security examples
Security examples
Security for Flow Management Clusters and Users in CDP Public Cloud
Session Rollback
Set Property Values
Set Ranger policies
Set up MirrorMaker in Cloudera Manager
Set up your network configuration
Set workload password
Setting up the HortonworksSchemaRegistry Controller Service
Setting user limits for Kafka
Settings
Settings Tab
Settings to avoid data loss
Show Local Changes
Show Local Changes
Simple .NET consumer
Simple .NET producer
Simple Java consumer
Simple Java producer
Site-to-Site
Sizing estimation based on network and disk message throughput
Sophisticated windowing in Flink
Sorting & Filtering Buckets
Sorting & Filtering Flows
Sorting & Filtering Users/Groups
Sorting and Filtering Components
Special Privileges
Split Content (One-to-Many)
Splitting and Aggregation
Start the data flow
Start the data flow
Start the data flow
Start Version Control
Start Version Control
Start Version Control on a Process Group
Start your data flow
Start your data flow
Start your data flow
Starting a Component
Starting and Stopping Processors
Starting NiFi
Starting NiFi Registry
State Manager
Stateless Mode and High Availability
StateManager
StaticKeyProvider
StaticKeyProvider
Status Bar
Step 1: Generate keys and certificates for Kafka brokers
Step 2: Create your own certificate authority
Step 3: Sign the certificate
Step 4: Configure Kafka brokers
Step 5: Configure Kafka clients
Stop Version Control
Stop Version Control
Stopping a Component
Storing and Retrieving State
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics deployment scenarios
Streaming use cases with Flink
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager Overview
Streams Messaging Manager Overview
Subscribing to a topic
Summary Page
Supplying a contribution
Supporting API
System Interaction
System Level Broker Tuning
System Properties
Technologies
Terminology
Terminology
Terminology Used in This Guide
Terminology Used in This Guide
Testing
The core concepts of NiFi
TLS
Tool usage
Topics
Tuning Apache Kafka Performance
UI Extensions
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding NiFi Record Based Processing
Understanding the kafka-run-class Bash Script
Understanding Version Dependencies
Unit Tests
Unlock Kafka metadata in Zookeeper
Unsupported Apache NiFi extensions
Unsupported Browsers
Unsupported Browsers
Unsupported command line tools
Unsupported features
Unsupported Flow Management features
Unsupported Streams Messaging features
Update Attributes Based on Content
Updating a Notifier
Updating an Alert Policy
Upload Bundle
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Use cases
Use rsync to copy files from one broker to another
User Authorization
User Window
Using Apache NiFi Provenance Tools
Using Apache NiFi Registry
Using Apache NiFi Templates
Using Custom Properties with Expression Language
Using Kafka's inter-broker security
Using Record-Enabled Processors
Using Schema Registry
Using the Apache NiFi Interface
Using watermark in Flink
Validate Output
Validating Processor Properties
ValidationContext
Validator
Variables
Variables in Versioned Flows
Variables in Versioned Flows
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Version States
Version States
Versioning a DataFlow
Versioning a DataFlow
Versioning an Apache NiFi DataFlow
View a Flow
Viewing FlowFile Lineage
Viewing the UI in Variably Sized Browsers
Viewing the UI in Variably Sized Browsers
Virtual memory handling
What is Apache Flink?
What is Apache NiFi?
What is it?
What is it?
What is it?
What Processors are Available
What's New in Cloudera Data Flow for Data Hub
When Processors are Triggered
Where to Start?
Who is This Guide For?
Working With Attributes
Working With Templates
Write Ahead Provenance Repository
Writing and Reading Content Claims
Writing and Reading Event Records
Writing and Reading FlowFiles
zookeeper-security-migration
«
Filter topics
Ingesting Data into Apache Hive in CDP Public Cloud
▼
Ingesting Data into Apache Hive in CDP Public Cloud
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
»
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your CDP user to utilize the central authentication features CDP provides and to exchange credentials for AWS or Azure access tokens, you have to map this CDP user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the Management Console in your CDP environment.
Create the Hive target table
Before you can ingest data into Apache Hive in CDP Public Cloud, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring
ConsumeKafkaRecord_2_0
. If you are moving data from a location other than Kafka, review
Getting Started with Apache NiFi
for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using
PutHive3Streaming
. If you are moving data into another location, review
Getting Started with Apache NiFi
for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.
Next steps
Provides information on what to do once you have moved data into Hive in CDP Public Cloud.
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0.2
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all