Homepage/Cloudera DataFlow for Data Hub7.2.8

(Public Cloud)

Search Documentation

▶︎Cloudera
1. Reference Architectures
▼Cloudera Public Cloud
▶︎Cloudera Private Cloud
▶︎Cloudera Manager
1. Cloudera Manager
▶︎Applications
▶︎Legacy
▶︎

Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime

«

Filter topics

CDF for Data Hub
▶︎Release Notes
▶︎Concepts
▶︎Planning
▼How To: Flow Management
▶︎How To: Streams Messaging
▶︎How To: Streaming Analytics
▶︎Reference
▶︎Learning & Training

.NET client
1. Overview
1.1. Version information
1.2. Contact information
1.3. License information
1.4. URI scheme
1.5. Tags
2. Security
2.1. Authorization
2.2. BasicAuth
3. Resources
3.1. Access
3.1.1. Get access status
3.1.2. Performs a logout for other providers that have been issued a JWT.
3.1.3. Create token trying all providers
3.1.4. Create token using identity provider
3.1.5. Test identity provider
3.1.6. Get identity provider usage
3.1.7. Create token using kerberos
3.1.8. Create token using basic auth
3.10. Items
3.10.1. Get all items
3.10.2. Get item fields
3.10.3. Get bucket items
3.11. Policies
3.11.1. Create access policy
3.11.2. Get all access policies
3.11.3. Get available resources
3.11.4. Get access policy for resource
3.11.5. Get access policy
3.11.6. Update access policy
3.11.7. Delete access policy
3.12. Tenants
3.12.1. Create user group
3.12.10. Delete user
3.12.2. Get user groups
3.12.3. Get user group
3.12.4. Update user group
3.12.5. Delete user group
3.12.6. Create user
3.12.7. Get all users
3.12.8. Get user
3.12.9. Update user
3.2. Bucket Bundles
3.2.1. Get extension bundles by bucket
3.2.2. Create extension bundle version
3.3. Bucket Flows
3.3.1. Create flow
3.3.10. Get latest bucket flow version metadata
3.3.11. Get bucket flow version
3.3.2. Get bucket flows
3.3.3. Get bucket flow
3.3.4. Update bucket flow
3.3.5. Delete bucket flow
3.3.6. Get bucket flow diff
3.3.7. Create flow version
3.3.8. Get bucket flow versions
3.3.9. Get latest bucket flow version content
3.4. Buckets
3.4.1. Create bucket
3.4.2. Get all buckets
3.4.3. Get bucket fields
3.4.4. Get bucket
3.4.5. Update bucket
3.4.6. Delete bucket
3.5. Bundles
3.5.1. Get all bundles
3.5.10. Get bundle version extension
3.5.11. Get bundle version extension docs
3.5.12. Get bundle version extension docs details
3.5.2. Get all bundle versions
3.5.3. Get bundle
3.5.4. Delete bundle
3.5.5. Get bundle versions
3.5.6. Get bundle version
3.5.7. Delete bundle version
3.5.8. Get bundle version content
3.5.9. Get bundle version extensions
3.6. Config
3.6.1. Get configration
3.7. Extension Repository
3.7.1. Get extension repo buckets
3.7.10. Get extension repo extension details
3.7.11. Get extension repo version checksum
3.7.12. Get global extension repo version checksum
3.7.2. Get extension repo groups
3.7.3. Get extension repo artifacts
3.7.4. Get extension repo versions
3.7.5. Get extension repo version
3.7.6. Get extension repo version content
3.7.7. Get extension repo extensions
3.7.8. Get extension repo extension
3.7.9. Get extension repo extension docs
3.8. Extensions
3.8.1. Get all extensions
3.8.2. Get extensions providing service API
3.8.3. Get extension tags
3.9. Flows
3.9.1. Get flow fields
3.9.2. Get flow
3.9.3. Get flow versions
3.9.4. Get latest flow version
3.9.5. Get latest flow version metadata
3.9.6. Get flow version
4. Definitions
4.1. AccessPolicy
4.10. BundleInfo
4.11. BundleVersion
4.12. BundleVersionDependency
4.13. BundleVersionMetadata
4.14. ComponentDifference
4.15. ComponentDifferenceGroup
4.16. ConnectableComponent
4.17. ControllerServiceAPI
4.18. ControllerServiceDefinition
4.19. CurrentUser
4.2. AccessPolicySummary
4.20. DeprecationNotice
4.21. DynamicProperty
4.22. DynamicRelationship
4.23. Extension
4.24. ExtensionBundle
4.25. ExtensionFilterParams
4.26. ExtensionMetadata
4.27. ExtensionMetadataContainer
4.28. ExtensionRepoArtifact
4.29. ExtensionRepoBucket
4.3. AllowableValue
4.30. ExtensionRepoGroup
4.31. ExtensionRepoVersion
4.32. ExtensionRepoVersionSummary
4.33. ExternalControllerServiceReference
4.34. Fields
4.35. JaxbLink
4.36. Permissions
4.37. Position
4.38. Property
4.39. ProvidedServiceAPI
4.4. Attribute
4.40. RegistryConfiguration
4.41. Relationship
4.42. Resource
4.43. ResourcePermissions
4.44. Restricted
4.45. Restriction
4.46. Stateful
4.47. SystemResourceConsideration
4.48. TagCount
4.49. Tenant
4.5. BatchSize
4.50. User
4.51. UserGroup
4.52. VersionedConnection
4.53. VersionedControllerService
4.54. VersionedFlow
4.55. VersionedFlowCoordinates
4.56. VersionedFlowDifference
4.57. VersionedFlowSnapshot
4.58. VersionedFlowSnapshotMetadata
4.59. VersionedFunnel
4.6. Bucket
4.60. VersionedLabel
4.61. VersionedParameter
4.62. VersionedParameterContext
4.63. VersionedPort
4.64. VersionedProcessGroup
4.65. VersionedProcessor
4.66. VersionedPropertyDescriptor
4.67. VersionedRemoteGroupPort
4.68. VersionedRemoteProcessGroup
4.7. BucketItem
4.8. BuildInfo
4.9. Bundle
@OnAdded
@OnEnabled
@OnPrimaryNodeStateChange
@OnRemoved
@OnScheduled
@OnShutdown
@OnStopped
@OnUnscheduled
AbstractProcessor API
Access Policies
Access Policies
Access Policy Configuration Examples
Accessing Parameters
Accessing the UI with Multi-Tenant Authorization
AccessPolicyProvider
Active / Active Architecture
Active / Stand-by Architecture
Add a User
Add an Empty Group
Add ControllerServices
Add Ranger policies
Add Ranger policies
Add the user or group to a pre-defined access policy
Add the user to predefined Ranger access policies
Add User to a Group
Adding a new schema
Adding a Parameter to a Parameter Context
Adding a Processor
Adding and Configuring Record Reader and Writer Controller Services
Adding Commands
Adding Components to the Canvas
Adding Controller Services for Dataflows
Adding Controller Services for Reporting Tasks
Adding Functionality to Apache NiFi
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Adding User-Defined Attributes
Additional Actions
Additional Certificate Commands
Additional Help
Additional Resources
Additional Testing Capabilities
Advanced Documentation
Aggregation for Analytics
Alias Properties
Align Horizontally
Align Vertically
Allow Bundles in a Bucket to be Overwritten
Allow Insecure Cryptographic Modes
Amazon Web Services
Analytics Framework
Analytics Properties
Analyzing data with Apache HBase in CDP Public Cloud
Analyzing data with Apache Kafka in CDP Public Cloud
Analyzing data with Apache Kudu in CDP Public Cloud
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Anatomy of a Process Group
Anatomy of a Processor
Anatomy of a Remote Process Group
Apache Flink Overview
Apache Kafka
Apache Kafka Overview
Apache Knox
Apache NiFi
Apache NiFi Admin Guide
Apache NiFi Expression Language Guide
Apache NiFi Overview
Apache NiFi Record Path Reference
Apache NiFi Registry Admin Guide
Apache NiFi Registry REST API Reference
Apache NiFi REST API Reference
Apache NiFi REST API Reference
Apache Patch Information in Cloudera DataFlow for Data Hub 7.2.8
Appendix - Schema example
Arrays
Assign resource roles
Assign the EnvironmentUser role
Assigning a Parameter Context to a Process Group
Assigning administrator level permissions
Assigning selective permissions to user
Attribute Extraction
Authentication
Authorization
Authorization
Authorization example
Authorization workflow
Authorizer Configuration
Authorizer Configuration
Authorizers.xml Setup
Authorizers.xml Setup
Authorizing Flow Management Cluster Access in CDP Public Cloud
Back-Referencing
Backup
Backup & Recovery
Backwards Compatibility
base64Decode
base64Encode
Basic Cluster Setup
Basics
Bcrypt, Scrypt, PBKDF2
Before you begin
Before you begin
Bending Connections
Bi-directional Replication Flows
Bidirectional replication example of two active clusters
Boolean Logic
Bootstrap Properties
Bootstrap Properties
Bootstrap.conf
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Browser Support
Browser Support
Bucket Policies
Bucket Policies
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building a DataFlow
Building Cloudera Manager charts with Kafka metrics
Building your dataflow
Building your dataflow
Bulletins
Bundle Coordinates
Bundle Id
Bundle Persistence
Bundle Persistence Providers
CDF for Data Hub
Change Version
Changing Component Versions
Changing Configuration and Context Menu Options
Channel encryption
Checking producer activity
Checking schema registration
Child Operator
Choosing the number of partitions for a topic
Clear Activity and Shutdown Existing NiFi
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
Client/Server
Cluster Common Properties
Cluster Migration Architectures
Cluster Node Identities
Cluster Node Properties
Cluster sizing
Clustering Configuration
Cohesion and Reusability
Command and Control of the DataFlow
Command Line Tools
Command Line Tools
Comments Tab
Commit Local Changes
Common Attributes
Common Processor Patterns
Communication within the Cluster
Compatibility Policies
Component Alignment
Component Lifecycle
Component Linking
Component Notification
Component Statistics
Component Status Repository
Component Support in Cloudera DataFlow for Data Hub 7.2.8
Component types and metrics for alert policies
Component Versions
ComponentLog
Composite Implementations
concat
Configuration Best Practices
Configuration examples
Configuration Files
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configure clients on a producer or consumer level
Configure clients on an application level
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Ranger policies for site-to-site communication
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Configure SRM for Failover and Failback
Configure srm-control for secured environments using Cloudera Manager
Configure srm-control for secured environments using environment variables
Configure srm-control for unsecured environments using Cloudera Manager
Configure srm-control for unsecured environments using environment variables
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your source processor
Configure your truststores
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Processor
Configuring a Processor
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring clusters and replications
Configuring Kafka ZooKeeper chroot
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring multiple listeners
Configuring properties not exposed in Cloudera Manager
Configuring replication specific REST servers
Configuring SMM for monitoring Kafka cluster replications
Configuring srm-control
Configuring State Providers
Configuring Streams Replication Manager
Configuring the Atlas hook in Kafka
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring Users & Access Policies
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Confirming your dataflow success
Confirming your dataflow success
Connect
Connect NiFi to the Registry
Connect workers
Connecting Components
Connecting Kafka Clients to CDP Public Cloud Clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Processors
Connecting to a NiFi Registry
Connecting to Kafka host
Connectors
Consider the User
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
Contact Us
contains
Content Repository
Content Viewers
Controller Services
Controller Services
Core Properties
Create a Bucket
Create a Bucket
Create a Bucket Policy
Create a custom access policy
Create a Custom Access Policy
Create a New Group with Selected Users
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your cluster
Create your cluster
Create your cluster
Create your streaming clusters
Creating a Kafka topic
Creating a notifier
Creating a Template
Creating an alert policy
Creating checkpoints and savepoints in Flink
Creating Controller Services for your dataflow
Creating Kafka topic
Creating Machine User
Creating TLS truststore
Creating Users and Groups
Creating your first Flow Management cluster
Creating your First Flow Management Cluster in CDP Public Cloud
Creating your first Streaming Analytics cluster
Creating your First Streaming Analytics Cluster in CDP Public Cloud
Creating your first Streams Messaging cluster
Creating your First Streams Messaging Cluster in CDP Public Cloud
Cross Data Center Replication
Cross data center replication example of multiple clusters
Custom Processor UIs
Custom Properties
Custom Properties
Data Buffering
Data Egress
Data Egress / Sending Data
Data Hub cluster definitions
Data Ingestion
Data Ingress
Data model version of serialized Flow snapshots
Data Provenance
Data Provenance
Data querying with SQL Client
Data Transformation
Data Types
Database Access
Database Properties
DatabaseFlowPersistenceProvider
Date Manipulation
Decommission Nodes
Define your CDP Private Cloud Base dataflow
Define your CDP Public Cloud dataflow
Defining Schema Registry access policies
Delegation token based authentication
Delete a Bucket
Delete a Bucket Policy
Delete a Flow
Delete a User
Delete Multiple Buckets
Delete Multiple Users
Delete Nodes
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deployment scenarios
Deployment scenarios
Deprecating a Component
Descendant Operator
Details of an Event
Details Tab
Developing a ControllerService
Developing a Reporting Task
Developing Apache Kafka Applications
Disabling an alert policy
Disconnect
Disconnect Nodes
Disk management
Disk Removal
Disk Replacement
Documenting a Component
Documenting Capability and Keywords
Documenting FlowFile Attribute Interaction
Documenting Properties
Documenting Related Components
Documenting Relationships
Download Bundle
Downloading and Installing NiFi
Downloading and Installing NiFi Registry
Downloading the Snowflake JDBC driver jar file
Driver inter-node coordination
Edit a Bucket Name
Edit a User Name
Email Notification Service
Embedded Zookeeper
Embedded ZooKeeper Server
Enable authorization in Kafka with Ranger
Enable high availability
Enable or disable authentication with delegation tokens
Enabling an alert policy
Enabling end-to-end latency monitoring
Enabling interceptors
Enabling/Disabling a Component
Enabling/Disabling Controller Services
Encode/Decode Functions
Encrypt-Config Tool
Encrypt-Config Tool
Encrypted Content Repository
Encrypted File System Content Repository Properties
Encrypted FlowFile Repository
Encrypted Passwords in Configuration Files
Encrypted Passwords in Configuration Files
Encrypted Provenance Considerations
Encrypted Provenance Repository
Encrypted Write Ahead FlowFile Repository Properties
Encrypted Write Ahead Provenance Repository Properties
Encryption Configuration
End to end latency overview
End to end latency use case
endsWith
Enqueue FlowFiles
Enrich/Modify Content
Error Handling
Escaping Expression Language
Essential metrics to monitor
Evaluating Multiple Attributes
Event Details
Event Hooks
Event-driven applications with Flink
Evolve your schema
Evolving a schema
Example - Secure NiFi Registry with Proxied-Entity
Example - Secure NiFi Registry without Proxied-Entity
Example Dataflow
Examples of Interacting with Schema Registry
Exceptions within a callback: IOException, RuntimeException
Exceptions within the Processor
Expanding an Event
Expected Behavior
Expected Behavior
Experimental Warning
Exporting a Template
Exposing Processor Properties
Exposing Processor's Relationships
Expression Language / Using Attributes in Property Values
Expression Language Editor
Expression Language in the Application
Expression Language in the Application
Expression Language Overview
Extension Directories
Extracting Attributes
Fan-in and Fan-out Replication Flows
fieldName
File descriptor limits
File Manager
File System Content Repository Properties
FileAccessPolicyProvider
FileAuthorizer
FileBasedKeyProvider
FileBasedKeyProvider
FileSystemBundlePersistenceProvider
FileSystemFlowPersistenceProvider
Filesystems
FileUserGroupProvider
Filter Functions
Filters
Find Parents
Finding list of brokers
Finding Schema Registry endpoint
Fixed Issues in Cloudera DataFlow for Data Hub 7.2.8
Fixed Issues in Flow Management
Fixed Issues in Streaming Analytics
Fixed Issues in Streams Messaging
Flink Streaming Applications
Flow Analyzer
Flow Election
Flow Management cluster definitions
Flow Management cluster layout
Flow Persistence
Flow Persistence Providers
FlowFile
FlowFile Repository
For Linux/Mac OS X users
For Linux/Unix/Mac OS X users
For Windows Users
format
Function Usage
Functions
Functions
Gather configuration information
General Design Considerations
Getting More Info for a Processor
Getting Started with Apache NiFi
Getting Started with Apache NiFi Registry
Getting Started with Streams Messaging Clusters on CDP Public Cloud
GitFlowPersistenceProvider
Give users access to your cluster
Give users access to your cluster
Give users access to your cluster
Governance
Grant Special Privileges to a User
Granting Machine User access to environment
Granularity of metrics for end-to-end latency
Group Window
Groups and fetching
H2
H2 Settings
Handling disk failures
Handling large messages
Handling state in Flink
High Level Overview of Key NiFi Features
Highly Available Kafka Architectures
Historical Statistics of a Component
How does it work?
How does it work?
How does it work?
How to contribute to Apache NiFi
How to install and start NiFi
How to install and start NiFi Registry
How to Set up Failover and Failback
HTTP
HTTP Notification Service
I Started NiFi Registry. Now What?
I Started NiFi. Now What?
Identity Mapping Properties
Identity Mapping Properties
Import a Versioned Flow
Import a Versioned Flow
Importing a Template
Improve Performance in Schema Registry
Individual Port Transmission
Ingesting data into Amazon S3
Ingesting Data into Amazon S3 Buckets
Ingesting Data into Apache HBase in CDP Cloud
Ingesting Data into Apache HBase in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting data into Apache Kafka
Ingesting Data into Apache Kafka in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting Data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting Data into CDP Public Cloud
Ingesting Data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Initial Admin Identity (New NiFi Instance)
Initial Admin Identity (New NiFi Registry Instance)
Install
Install the new NiFi Version
Installing as a Service
Installing as a Service
Instantiate TestRunner
Instantiating a Template
Integrate Kafka and Schema Registry
Integrate Kafka and Schema Registry using NiFi Processors
Integrating with Kafka
Integrating with NiFi
Integrating with Schema Registry
Inter-broker security
Interacting with a ControllerService
Interactive Usage
Introducing streams messaging cluster on CDP Public Cloud
Introduction
Introduction
Introduction
Introduction to alert policies in Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Introduction to Streams Messaging Manager
isBlank
isEmpty
ISR management
Java client
Java Cryptography Extension (JCE) Limited Strength Jurisdiction Policies
JBOD
JBOD Disk migration
JBOD setup
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
JVM and garbage collection
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka consumers
Kafka FAQ
Kafka Introduction
Kafka producers
Kafka public APIs
Kafka security hardening with Zookeeper ACLs
Kafka Streams
kafka-*-perf-test
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberizing Embedded ZooKeeper Server
Kerberizing NiFi's ZooKeeper Client
Kerberos
Kerberos
Kerberos authentication
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Kerberos Properties
Kerberos Properties
Kerberos Service
Kerberos Service
Key Derivation Functions
Key Features
Key Rotation
Key Rotation
Known Issues In Cloudera DataFlow for Data Hub 7.2.8
Known Issues in Flow Management
Known Issues in Streaming Analytics
Known Issues in Streams Messaging
LDAP authentication
LdapUserGroupProvider
Leader positions and in-sync replicas
Legacy Authorized Users (NiFi Instance Upgrade)
Lightweight Directory Access Protocol (LDAP)
Lightweight Directory Access Protocol (LDAP)
Lineage Graph
Log cleaner
Logging In
Logging In
LoggingEventHookProvider
Logs and log segments
Main Use Cases
Make a Bucket Publicly Visible
Manage Buckets
Manage Bundles
Manage Flows
Manage Groups
Manage individual delegation tokens
Manage Users & Groups
Management basics
Managing Alert Policies
Managing alert policies and notifiers in SMM
Managing Apache Kafka
Managing Kafka Topics
Managing Local Changes
Managing Nodes
Managing Templates
Managing topics across multiple Kafka clusters
Maps
matchesRegex
Mathematical Operations and Numeric Manipulation
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata Database
Metadata Database
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating a Flow with Sensitive Properties
Migrating Between Source and Destination ZooKeepers
Migrating Consumer Groups Between Clusters
Mocking External Resources
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring checkpoint latency for cluster replication
Monitoring End to End Latency
Monitoring end to end latency for Kafka topic
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka Cluster Replications using Streams Messaging Manager
Monitoring Kafka Clusters
Monitoring Kafka clusters
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring NiFi
Monitoring of DataFlow
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Moving Data in and Out of Snowflake
Moving data out of Snowflake
Moving Data using NiFi Site-to-Site
Multi-Tenant Authorization
MySQL
Naming Conventions
Navigating within a DataFlow
Nested Versioned Flows
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
NiFi
NiFi Architecture
NiFi Archives (NARs)
NiFi CLI
NiFi CLI Node Commands
NiFi Components
NiFi Legacy
NiFi patches
NiFi Registry patches
NiFi Registry User Interface
NiFi Toolkit Administrative Tools
NIFi Toolkit Overview
NiFi User Interface
nifi-cert.pem
nifi-key.key
Node Manager
not
Notes
Notes
Notification Services
Notifiers
Notify
Obtain HBase connection details
Obtain Hive connection details
Offload Nodes
Offsets Subcommand
Older Existing NiFi Version
On-premise to Cloud and Kafka Version Upgrade
OpenId Connect
OpenSSL PKCS#5 v1.5 EVP_BytesToKey
Operating system requirements
Operation Modes
Other Components
Other Group Level Actions
Other Management Features
Output
Overview
PadLeft
PadRight
PAM authentication
Parameter Contexts
Parameters
Parameters in Versioned Flows
Partitions
Penalization vs. Yielding
Per-Instance ClassLoading
Performance considerations
Performance Expectations and Characteristics of NiFi
Performant .NET producer
Performing the Work
Persistence Providers
Persistence Providers
Persistent Provenance Repository Properties
Planning for Streams Replication Manager
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Port Configuration
Postgres
Potential Issues
Potential Issues
Potential Issues
Potential issues with wildcard certificates
Pre-defined Access Policies for Schema Registry
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
Predicates
Prepare your clusters
Prepare your environment
Prerequisites for Running in a Secure Environment
Preserve Custom Processors
Preserve Modified NARs
Principal name mapping
ProcessContext
Processor API
Processor Behavior Annotations
Processor Initialization
Processor Validation
ProcessorInitializationContext
ProcessSession
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Properties Tab
Property/Argument Handling
PropertyDescriptor
PropertyValue
Protocol between consumer and broker
Provenance Events
Provenance Events
Provenance Repository
Providers Properties
Proxy Configuration
Proxy Configuration
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using Apache NiFi
Querying a schema
Queue Interaction
Quotas
Rack awareness
Ranger
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Recommendations for client development
Recommended Antivirus Exclusions
Recommended Antivirus Exclusions
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
RecordPath Overview
Records
Referencing Parameters
Relationship
Release Notes
Remote Process Group Transmission
Remote Topics
Remove
Remove a User from a Group
Removing a Template
replace
replaceRegex
Replaying a FlowFile
Replicate data between Data Hub clusters with cloud SRM
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication Flows Overview
Reporting Processor Activity
Reporting Tasks
Reporting Tasks
Responding to Changes in Configuration
Restore
Restrict access to Kafka metadata in Zookeeper
Restricted
Restricted Components in Versioned Flows
Restricted Controller Service Created in Process Group
Restricted Controller Service Created in Root Process Group
Retries
Retrieve keytab file
Retrieving log directory replica assignment information
Reverse Proxy Configurations
Revert Local Changes
RocksDB FlowFile Repository
Rotate the master key/secret
Route Based on Attributes
Route Based on Content (One-to-Many)
Route Based on Content (One-to-One)
Route Streams Based on Content (One-to-Many)
Routing and Mediation
Routing on Attributes
Run the Processor
S2S
S3BundlePersistenceProvider
Salt and IV Encoding
Save Changes to a Versioned Flow
Scheduling Tab
Schema Differences & Limitations
Schema Entities
Schema Registry
Schema Registry Authorization through Ranger Access Policies
Schema Registry Component Architecture
Schema Registry Concepts
Schema Registry Overview
Schema Registry Overview
Schema Registry Use Cases
Scope
ScriptEventHookProvider
Searching
Searching by topic name
Searching for Events
Searching Kafka cluster replications by source
Securing Apache Kafka
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Securing ZooKeeper
Security Configuration
Security Configuration
Security Configuration
Security examples
Security examples
Security for Flow Management Clusters and Users in CDP Public Cloud
Security overview
Security Properties
Security Properties
Sensitive Property Key Migration
Session Rollback
Set Property Values
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set up the HortonworksSchemaRegistry Controller Service
Set up your network configuration
Set workload password
Setting up authorization policies
Setting user limits for Kafka
Setting workload password
Settings
Settings Tab
Settings to avoid data loss
Shared Event Hook Properties
ShellUserGroupProvider
Show Local Changes
Signing with Externally-signed CA Certificates
Simple .NET consumer
Simple .NET producer
Simple Java consumer
Simple Java producer
Site to Site and Reverse Proxy Examples
Site to Site Properties
Site to Site protocol sequence
Site to Site Routing Properties for Reverse Proxies
Site-to-Site
Sizing estimation based on network and disk message throughput
Sophisticated windowing in Flink
Sorting & Filtering Buckets
Sorting & Filtering Flows
Sorting & Filtering Users/Groups
Sorting and Filtering Components
Special Privilege Policies
Special Privileges
Split Content (One-to-Many)
Splitting and Aggregation
SRM Command Line Tools
SRM security example for a cluster environment managed by a single Cloudera Manager instance
SRM security example for a cluster environment managed by multiple Cloudera Manager instances
srm-control
srm-control Options Reference
Standalone
Standalone Functions
StandardManagedAuthorizer
StandardManagedAuthorizer
Start New NiFi
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start Version Control
Start Version Control on a Process Group
Start your data flow
Start your data flow
Start your data flow
Starting a Component
Starting and Stopping Processors
Starting NiFi
Starting NiFi Registry
startsWith
State Management
State Management
State Manager
StateManager
StaticKeyProvider
StaticKeyProvider
Status
Status Bar
Stop Version Control
Stopping a Component
Storing and Retrieving State
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics deployment scenarios
Streaming use cases with Flink
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager Overview
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager Overview
Streams Replication Manager Reference
Streams Replication Manager requirements
Streams Replication Manager Service
String Manipulation
Structure of a NiFi Expression
Structure of a RecordPath
Subjectless Functions
Subscribing to a topic
substring
substringAfter
substringAfterLast
substringBefore
substringBeforeLast
Summary Page
Supplying a contribution
Supported NiFi Controller Services
Supported NiFi Extensions
Supported NiFi Processors
Supported NiFi Reporting Tasks
Supporting API
Swap Management
Switching from other Flow Persistence Provider
System Interaction
System Level Broker Tuning
System Properties
System Properties
System Properties
System Requirements
System Requirements
Task architecture and load-balancing
Technologies
Templates
Terminology
Terminology
Terminology
Terminology Used in This Guide
Terminology Used in This Guide
Testing
The core concepts of NiFi
TLS Generation Toolkit
TLS Toolkit
TLS/SSL client authentication
toBytes
toDate
toLowerCase
Tool usage
Topics
Topics and Groups Subcommand
toString
toUpperCase
trim
Troubleshooting
Troubleshooting Kerberos Configuration
Tuning Apache Kafka Performance
Type Coercion
UI Extensions
Understand the NiFi Record Based Processors and Controller Services
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding Replication Flows
Understanding the kafka-run-class Bash Script
Understanding Version Dependencies
Unit Tests
Unlock Kafka metadata in Zookeeper
Unsupported Browsers
Unsupported Browsers
Unsupported command line tools
Unsupported Features in Cloudera DataFlow for Data Hub 7.2.8
Unsupported Flow Management features
Unsupported Streaming Analytics features
Unsupported Streams Messaging features
Update Attributes Based on Content
Update the Configuration Files for Your New NiFi Installation
Updating a notifier
Updating an alert policy
Upgrade Recommendations
Upgrading NiFi
Upload Bundle
URL Aliasing
Usage
Usage
Usage
Usage
Usage
Usage
Usage
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Use Case Architectures
Use cases
Use cases for Streams Replication Manager in CDP Public Cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User Authentication
User Authentication
User Authorization
User Window
UserGroupProvider
Using An Existing Intermediate Certificate Authority (CA)
Using Apache NiFi
Using Apache NiFi Registry
Using Custom Properties with Expression Language
Using Record-Enabled Processors
Using Schema Registry
Using SRM in CDP Public Cloud overview
Using Streams Replication Manager
Using the Apache NiFi Toolkit
Using watermark in Flink
Validate Output
Validating Processor Properties
ValidationContext
Validator
Variables
Variables in Versioned Flows
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying the setup
Version States
Versioning a DataFlow
View a Flow
Viewing data lineage in Apache Atlas
Viewing FlowFile Lineage
Viewing Kafka cluster replication details
Viewing the UI in Variably Sized Browsers
Viewing the UI in Variably Sized Browsers
Virtual memory handling
Volatile Content Repository Properties
Volatile FlowFile Repository
Volatile Provenance Repository Properties
Web Properties
Web Properties
What is Apache Flink?
What is Apache NiFi?
What is it?
What is it?
What is it?
What Processors are Available
What to do next
What's New in Cloudera DataFlow for Data Hub 7.2.8
What's New in Flow Management
What's New in Streaming Analytics
What's New in Streams Messaging
When Processors are Triggered
Where To Go For More Information
Where to Start?
Who is This Guide For?
Who is This Guide For?
Why Cluster?
Wildcard Certificates
Working With Attributes
Working With Templates
Write Ahead FlowFile Repository
Write Ahead Provenance Repository
Write Ahead Provenance Repository Properties
Writing and Reading Content Claims
Writing and Reading Event Records
Writing and Reading FlowFiles
Zero-Master Clustering
ZooKeeper Access Control
ZooKeeper Migration Steps
ZooKeeper Migrator
ZooKeeper Migrator
ZooKeeper Properties
zookeeper-security-migration

«

Filter topics

Ingesting Data into Apache Hive in CDP Public Cloud

▼Ingesting Data into Apache Hive in CDP Public Cloud

»

Ingesting Data into Apache Hive in CDP Public Cloud

Ingesting Data into Apache Hive in CDP Public Cloud

Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your CDP user to utilize the central authentication features CDP provides and to exchange credentials for AWS or Azure access tokens, you have to map this CDP user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the Management Console in your CDP environment.
Create the Hive target table
Before you can ingest data into Apache Hive in CDP Public Cloud, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring ConsumeKafkaRecord_2_0. If you are moving data from a location other than Kafka, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using PutHive3Streaming. If you are moving data into another location, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.
Next steps
Provides information on what to do once you have moved data into Hive in CDP Public Cloud.

© 2019–2021 by Cloudera, Inc. All rights reserved.

We want your opinion

How can we improve this page?

What kind of feedback do you have?

This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or

7.3.1
7.2
7.1.0
7.0.2