Homepage
/
Cloudera DataFlow for Data Hub
7.3.1
(on cloud • latest)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
DataFlow for Data Hub
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
DataFlow for Data Hub
Runtime
«
Filter topics
Cloudera DataFlow for Data Hub
▶︎
Release Notes
▶︎
What's New in Cloudera DataFlow for Data Hub 7.3.1
What's new in Flow Management with NiFi 1
What's new in Flow Management with NiFi 2 [Technical Preview]
What's new in Edge Management [Technical Preview]
What's New in Streams Messaging
What's New in Cloudera Streaming Analytics
Component support in Cloudera DataFlow for Data Hub 7.3.1
▶︎
Supported NiFi extensions
Supported NiFi processors
Supported NiFi controller services
Supported NiFi reporting tasks
Supported NiFi parameter providers
Supported NiFi flow analysis rules [Technical Preview]
Supported NiFi Python components [Technical Preview]
Cloudera exclusive components [Technical Preview]
Components supported by partners
▶︎
Unsupported Features in Cloudera DataFlow for Data Hub 7.3.1
Unsupported Flow Management features
Unsupported Edge Management features [Technical Preview]
Unsupported Streams Messaging features
Unsupported Cloudera Streaming Analytics features
▶︎
Known Issues In Cloudera DataFlow for Data Hub 7.3.1
Known issues in Flow Management
Known issues in Edge Management [Technical Preview]
Known Issues in Streams Messaging
Known Issues in Cloudera Streaming Analytics
▶︎
Deprecation notices In Cloudera DataFlow for Data Hub 7.3.1
Deprecation Notices for Cloudera Streaming Analytics
▶︎
Fixed Issues in Cloudera DataFlow for Data Hub 7.3.1
Fixed issues in Flow Management
Fixed issues in Edge Management [Technical Preview]
Fixed Issues in Streams Messaging
Fixed Issues in Cloudera Streaming Analytics
▶︎
Fixed CVEs in Cloudera DataFlow for Data Hub 7.3.1
Log4j vulnerabilities
Fixed CVEs in Flow Management
▶︎
Behavioral Changes in Cloudera DataFlow for Data Hub 7.3.1
Behavioral Changes in Flow Management
Behavioral Changes in Streams Messaging
Behavioral Changes in Cloudera Streaming Analytics
▼
Flow Management
▶︎
Flow Management overview
What is NiFi?
What is NiFi Registry?
Cloudera Manager integration
▶︎
Planning your Flow Management deployment
Flow Management cluster definitions
Flow Management cluster layout
▶︎
Setting up your Flow Management cluster
Checking prerequisites
Creating your cluster
Giving access to your cluster
▶︎
Working with your Flow Management cluster
▶︎
Authorizing Flow Management cluster access
Security for Flow Management clusters and users in Cloudera on cloud
▶︎
User authorization
Authorization workflow
Assigning administrator level permissions
▶︎
Assigning selective permissions to user
Assign the EnvironmentUser role
Add the user to predefined Ranger access policies
Create a custom access policy
Authorization example
Predefined Ranger access policies for NiFi
Predefined Ranger access policies for NiFi Registry
▶︎
Scaling your Flow Management cluster
▶︎
Scaling up or down a NiFi cluster
Scaling up a NiFi cluster
Scaling down a NiFi cluster
▶︎
Changing Java version in Flow Management cluster
Changing the Java version of Flow Management Data Hub clusters
▶︎
Fetching new components and fixes
Automatic access to new components and fixes without upgrading
▶︎
Hot loading custom NARs
Configuring Flow Management clusters to hot load custom NARs
▶︎
Using Parameter Context inheritance
What is Parameter Context inheritance?
▶︎
Example for configuring Parameter Context inheritance
Creating the basic Parameter Contexts
Setting up Parameter Context inheritance
Parameter overriding
▶︎
Using Parameter Providers
What are Parameter Providers?
▶︎
Example for using Parameter Providers
Creating and configuring a Parameter Provider
Fetching Parameters
Creating a Parameter Context from a Parameter Group
Updating Parameter sensitivity
Updating Parameter Context when the external source has changed
Using Parameter Context inheritance to combine Parameters
▶︎
Using DataFlow Catalog Registry Client
Creating a Machine User
Adding a new Registry Client
Checking out a ReadyFlow
Checking out your flows
Versioning a flow in the Catalog
▶︎
Exporting/importing a data flow using NiFi Toolkit CLI
Overview
Connecting to NiFi Registry with NiFi Toolkit CLI
Exporting a flow from NiFi Registry
Importing a new flow into NiFi Registry
▶︎
Switching flow persistence providers using NiFi Toolkit CLI
▶︎
Exporting or importing data flows with NiFi Toolkit CLI
Flow persistance providers
Flow persistance providers
Switching flow persistance providers
▼
Moving data with NiFi
▶︎
Ingesting Data into HBase in Cloudera on Cloud
▶︎
Ingesting Data into HBase
Understand the use case
Meet the prerequisites
Create the HBase target table
Add Ranger policies
Obtain HBase connection details
Build the data flow
Configure the HBase client service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▼
Ingesting Data into Hive in Cloudera on Cloud
▼
Ingesting data into Hive
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Kafka in Cloudera on Cloud
▶︎
Ingesting data into Kafka
Understand the use case
Meet the prerequisites
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring end to end latency for Kafka topic
Monitoring your data flow
Next steps
Appendix - Schema example
▶︎
Ingesting data into Kudu in Cloudera on Cloud
▶︎
Ingesting data into Kudu
Understand the use case
Meet the prerequisites
Create the Kudu target table
Build the data flow
Configure the Controller Service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify that you can write data to Kudu
Next steps
▶︎
Ingesting data into Solr in Cloudera on Cloud
▶︎
Ingesting data into Solr
Understand the use case
Meet the prerequisites
Create Solr target collection
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Cloudera Datawarehouse using Iceberg table format
▶︎
Ingesting data into Cloudera Data Warehouse using Iceberg table format
Understand the use case
Meet the prerequisites
Create Iceberg target table
Build the data flow
Create and configure controller services
Configure processor for data source
Configure processor for data target
Start the data flow
▶︎
Ingesting data into Amazon S3 Buckets
▶︎
Ingesting data into Amazon S3
Understand the use case
Meet the prerequisites
Build the data flow
Set up AWS for your ingest data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Azure Data Lake Storage
▶︎
Ingesting data into Azure Data Lake Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Google Cloud Storage
▶︎
Ingesting data into Google Cloud Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Viewing data lineage in Apache Atlas
Next steps
▶︎
Ingesting data into cloud object stores with RAZ authorizations
▶︎
Ingesting data into Cloudera Object Stores with RAZ authorization
Understand the use case
Meet the prerequisites
Build the data flow
Configure each object store processor
Set permissions in Ranger
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Moving data in and out of Snowflake
Pushing data to and moving data from Snowflake using NiFi
▶︎
Moving data out of Snowflake
Before you begin
Downloading the Snowflake JDBC driver JAR file
Adding Snowflake CA certificates to NiFi truststore
Building your data flow
Creating Controller Services for your data flow
Configuring your source processor
Configuring your target processor
Confirming your data flow success
▶︎
Pushing data into Snowflake
Before you begin
Adding Snowflake CA certificates to NiFi truststore
Building your dataflow
Configuring your Controller Services
Configure your source processor
Configuring your target processor
Confirming your data flow success
Next steps
▶︎
Moving data using NiFi site-to-site
▶︎
Moving data from Cloudera on premises to Cloudera on cloud with NiFi site-to-site
Understanding the use case
Preparing your clusters
Setting up your network configuration
Configuring your truststores
Defining your Cloudera on cloud data flow
Configuring Ranger policies for site-to-site communication
Defining your Cloudera Base on premises data flow
▶︎
Processing mainframe / EBCDIC data in NiFi [Technical Preview]
Use case overview
Before you begin
Configuring EBCDICRecordReader
Building your dataflow
▶︎
Apache NiFi
Getting started with Apache NiFi
Using Apache NiFi
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi System Administrator's Guide
Using Apache NiFi Toolkit
Apache NiFi Developer's Guide
▶︎
Apache NiFi Registry
Getting started with Apache NiFi Registry
Using Apache NiFi Registry
Apache NiFi Registry System Administrator's Guide
▶︎
NiFi Components Reference
NiFi 1.28 Components in Cloudera Flow Management 2.2.9
NiFi 2.0 Components in Cloudera Flow Management 4.2.1
▶︎
Edge Management [Technical Preview]
▶︎
Planning your Edge Management deployment
Edge Management cluster definitions
Edge Management cluster layout
▶︎
Setting up your Edge Management cluster
Checking prerequisites
Creating your cluster
After creating your cluster
▶︎
Streams Messaging
▶︎
Planning your Streams Messaging deployment
Cloudera Data Hub cluster definitions
Streams Messaging cluster layout
▶︎
Setting up your Streams Messaging cluster
Checking prerequisites
Creating your cluster
Deleting ZooKeeper from Streams Messaging clusters
Configuring data directories for clusters with custom disk configurations
Giving access to your cluster
▶︎
Connecting Kafka clients to Cloudera on cloud clusters
Connecting Kafka clients to Cloudera Data Hub provisioned clusters
▶︎
Scaling Streams Messaging clusters
▶︎
Scaling Kafka brokers
Scaling up Kafka brokers
Scaling down Kafka brokers
▶︎
Troubleshooting
The downscale operation fails with decommission failed
▶︎
Scaling Kafka Connect
Scaling up Kafka Connect
Scaling down Kafka Connect
Scaling KRaft
▶︎
Apache Kafka
▶︎
Apache Kafka overview
Kafka Introduction
▶︎
Kafka Architecture
Brokers
Topics
Records
Partitions
Record order and assignment
Logs and log segments
Kafka brokers and Zookeeper
Leader positions and in-sync replicas
Kafka stretch clusters
Kafka disaster recovery
Kafka rack awareness
Kafka KRaft [Technical Preview]
▶︎
Kafka FAQ
Basics
Use cases
▶︎
Configuring Apache Kafka
Operating system requirements
Performance considerations
Quotas
▶︎
JBOD
JBOD setup
JBOD Disk migration
Setting user limits for Kafka
Connecting Kafka clients to Cloudera Data Hub provisioned clusters
▶︎
Rolling restart checks
Configuring rolling restart checks
Configuring the client configuration used for rolling restart checks
▶︎
Cluster discovery with multiple Apache Kafka clusters
▶︎
Cluster discovery using DNS records
A records and round robin DNS
client.dns.lookup property options for client
CNAME records configuration
Connection to the cluster with configured DNS aliases
▶︎
Cluster discovery using load balancers
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
Connecting to the Kafka cluster using load balancer
Configuring Kafka ZooKeeper chroot
Rack awareness
▶︎
Securing Apache Kafka
▶︎
Channel encryption
Configure Kafka brokers
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Zookeeper TLS/SSL support for Kafka
▶︎
Authentication
▶︎
TLS/SSL client authentication
Configure Kafka brokers
Configure Kafka clients
Principal name mapping
Enable Kerberos authentication
▶︎
Delegation token based authentication
Enable or disable authentication with delegation tokens
Manage individual delegation tokens
Rotate the master key/secret
▶︎
Client authentication using delegation tokens
Configure clients on a producer or consumer level
Configure clients on an application level
▶︎
LDAP authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
PAM authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
OAuth2 authentication
Configuring Kafka brokers
Configuring Kafka clients
▶︎
Authorization
▶︎
Ranger
Enable authorization in Kafka with Ranger
Configure the resource-based Ranger service used for authorization
Kafka ACL APIs support in Ranger
▶︎
Governance
Importing Kafka entities into Atlas
Configuring the Atlas hook in Kafka
Inter-broker security
Configuring multiple listeners
▶︎
Kafka security hardening with Zookeeper ACLs
Restricting access to Kafka metadata in Zookeeper
Unlocking access to Kafka metadata in Zookeeper
▶︎
Tuning Apache Kafka performance
Handling large messages
▶︎
Cluster sizing
Sizing estimation based on network and disk message throughput
Choosing the number of partitions for a topic
▶︎
Broker Tuning
JVM and garbage collection
Network and I/O threads
ISR management
Log cleaner
▶︎
System Level Broker Tuning
File descriptor limits
Filesystems
Virtual memory handling
Networking parameters
Configure JMX ephemeral ports
Kafka-ZooKeeper performance tuning
▶︎
Managing Apache Kafka
▶︎
Management basics
Broker log management
Record management
Broker garbage log collection and log rotation
Client and broker compatibility across Kafka versions
▶︎
Managing topics across multiple Kafka clusters
Set up MirrorMaker in Cloudera Manager
Settings to avoid data loss
▶︎
Broker migration
Migrate brokers by modifying broker IDs in meta.properties
Use rsync to copy files from one broker to another
▶︎
Disk management
Monitoring
▶︎
Handling disk failures
Disk Replacement
Disk Removal
Reassigning replicas between log directories
Retrieving log directory replica assignment information
▶︎
Metrics
Building Cloudera Manager charts with Kafka metrics
Essential metrics to monitor
▶︎
Command Line Tools
Unsupported command line tools
kafka-topics
kafka-cluster
kafka-configs
kafka-console-producer
kafka-console-consumer
kafka-consumer-groups
kafka-features
kafka-reassign-partitions
kafka-log-dirs
zookeeper-security-migration
kafka-delegation-tokens
kafka-*-perf-test
Configuring log levels for command line tools
Understanding the kafka-run-class Bash Script
▶︎
Developing Apache Kafka applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Simple .Net consumer using Schema Registry
Simple .Net producer using Schema Registry
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Kafka Connect
Kafka Connect Overview
Setting up Kafka Connect
▶︎
Using Kafka Connect
Configuring the Kafka Connect Role
Managing, Deploying and Monitoring Connectors
▶︎
Writing Kafka data to Ozone with Kafka Connect
Writing data in an unsecured cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
Using the AvroConverter
Configuring EOS for source connectors
▶︎
Securing Kafka Connect
▶︎
Kafka Connect to Kafka broker security
Configuring TLS/SSL encryption
Configuring Kerberos authentication
▶︎
Kafka Connect REST API security
▶︎
Authentication
Configuring TLS/SSL client authentication
Configuring SPNEGO authentication and trusted proxies
▶︎
Authorization
Authorization model
Ranger integration
▶︎
Kafka Connect connector configuration security
▶︎
Kafka Connect Secrets Storage
Terms and concepts
Managing secrets using the REST API
Re-encrypting secrets
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring a Nexus repository allow list
▶︎
Single Message Transforms
Configuring an SMT chain
ConvertFromBytes
ConvertToBytes
▶︎
Connectors
Installing connectors
Debezium Db2 Source
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
HTTP Source
JDBC Source
JMS Source
MQTT Source
SFTP Source
▶︎
Stateless NiFi Source and Sink
Dataflow development best practices
Kafka Connect worker assignment
Kafka Connect log files
Kafka Connect tasks
Developing a dataflow
Deploying a dataflow
Downloading and viewing predefined dataflows
Configuring flow.snapshot
Tutorial: developing and deploying a JDBC Source dataflow
Syslog TCP Source
Syslog UDP Source
ADLS Sink
Amazon S3 Sink
HDFS Sink
HDFS Stateless Sink
HTTP SInk
InfluxDB SInk
JDBC Sink
Kudu Sink
S3 Sink
▶︎
Schema Registry
▶︎
Schema Registry overview
▶︎
Schema Registry overview
Examples of interacting with Schema Registry
▶︎
Schema Registry use cases
Registering and querying a schema for a Kafka topic
Deserializing and serializing data from and to a Kafka topic
Dataflow management with schema-based routing
Schema Registry component architecture
▶︎
Schema Registry concepts
Schema entities
Compatibility policies
Importance of logical types in Avro
▶︎
Integrating with Schema Registry
▶︎
Integrating Schema Registry with NiFi
NiFi record-based Processors and Controller Services
Configuring Schema Registry instance in NiFi
Setting schema access strategy in NiFi
Adding and configuring record-enabled Processors
Integrating Schema Registry with Kafka
Integrating Schema Registry with Flink and Cloudera SQL Stream Builder
Integrating Schema Registry with Atlas
Improving performance in Schema Registry
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
Importing Confluent Schema Registry schemas into Schema Registry
▶︎
Exporting and importing schemas
Exporting schemas using Schema Registry API
Importing schemas using Schema Registry API
▶︎
ID ranges in Schema Registry
Setting a Schema Registry ID range
▶︎
Load balancer in front of Schema Registry instances
Configurations required to use load balancer with Kerberos enabled
Configurations required to use load balancer with SSL enabled
▶︎
Securing Schema Registry
▶︎
Schema Registry authorization through Ranger access policies
Predefined access policies for Schema Registry
Adding the user or group to a predefined access policy
Creating a custom access policy
▶︎
Schema Registry authentication through OAuth2 JWT tokens
JWT algorithms
Public key and secret storage
Authentication using OAuth2 with Kerberos
Schema Registry server configuration
Configuring the Schema Registry client
Schema Registry REST API reference
▶︎
Streams Messaging Manager
▶︎
Streams Messaging Manager overview
Introduction to Streams Messaging Manager
▶︎
Getting started with Streams Messaging clusters in Cloudera on cloud
Introducing streams messaging cluster on Cloudera on cloud
Meet the prerequisites to create streams messaging cluster
Creating Machine User
Granting Machine User access to environment
Creating Kafka topic
▶︎
Create Ranger policies for Machine User account
Create topic policy
Create consumer group policy
▶︎
Produce data to Kafka topic
Setting workload password
Connecting to Kafka host
Configuring LDAP authentication
Producing data to Kafka topic
Consuming data from Kafka topic
▶︎
Use Kerberos authentication
Kerberos authentication using the ticket cache
Kerberos authentication using a keytab
Monitoring Kafka activity in Streams Messaging Manager
▶︎
Use Schema Registry
▶︎
Gather configuration information
Finding list of brokers
Finding Schema Registry endpoint
Creating TLS truststore
Defining Schema Registry access policies
Producing data in Avro format
Checking schema registration
Checking producer activity
Consuming data from Kafka topics using stored schemas
▶︎
Monitor end-to-end latency
Setting up authorization policies
Enabling end-to-end latency monitoring
▶︎
Evolve your schema
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
What to do next
▶︎
Configuring Streams Messaging Manager
Installing Streams Messaging Manager in Cloudera on cloud
▶︎
Setting up Prometheus for Streams Messaging Manager
▶︎
Prometheus configuration for Streams Messaging Manager
Prerequisites for Prometheus configuration
Prometheus properties configuration
Streams Messaging Manager property configuration in Cloudera Manager for Prometheus
Kafka property configuration in Cloudera Manager for Prometheus
Kafka Connect property configuration in Cloudera Manager for Prometheus
Start Prometheus
▶︎
Secure Prometheus for Streams Messaging Manager
▶︎
Nginx proxy configuration over Prometheus
Nginx installtion
Nginx configuration for Prometheus
▶︎
Setting up TLS for Prometheus
Configuring Streams Messaging Manager to recognize Prometheus's TLS certificate
▶︎
Setting up basic authentication with TLS for Prometheus
Configuring Nginx for basic authentication
Configuring Streams Messaging Manager for basic authentication
Setting up mTLS for Prometheus
Prometheus for Streams Messaging Manager limitations
Troubleshooting Prometheus for Streams Messaging Manager
Performance comparison between Cloudera Manager and Prometheus
▶︎
Using Streams Messaging Manager
▶︎
Monitoring Kafka
Monitoring Kafka clusters
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring Kafka brokers
Monitoring Kafka consumers
Monitoring log size information
Monitoring lineage information
▶︎
Managing Kafka topics
Creating a Kafka topic
Modifying a Kafka topic
Deleting a Kafka topic
▶︎
Managing Alert Policies and Notifiers
Creating a notifier
Updating a notifier
Deleting a notifier
Creating an alert policy
Updating an alert policy
Enabling an alert policy
Disabling an alert policy
Deleting an alert policy
Component types and metrics for alert policies
▶︎
Monitoring end-to-end latency
Enabling interceptors
Monitoring end to end latency for Kafka topic
End to end latency use case
▶︎
Monitoring Kafka cluster replications (Streams Replication Manager)
▶︎
Viewing Kafka cluster replication details
Searching Kafka cluster replications by source
Monitoring Kafka cluster replications by quick ranges
Monitoring status of the clusters to be replicated
▶︎
Monitoring topics to be replicated
Searching by topic name
Monitoring throughput for cluster replication
Monitoring replication latency for cluster replication
Monitoring checkpoint latency for cluster replication
Monitoring replication throughput and latency by values
▶︎
Managing and monitoring Kafka Connect
The Kafka Connect UI
Deploying and managing connectors
▶︎
Managing and monitoring Cruise Control rebalance
Authorizing users to access Cruise Control in Streams Messaging Manager
Cruise Control dashboard in Streams Messaging Manager UI
Using the Rebalance Wizard in Cruise Control
▶︎
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Verifying the setup
Streams Messaging Manager REST API reference
▶︎
Streams Replication Manager
▶︎
Streams Replication Manager overview
Overview
Key Features
Main Use Cases
Use case architectures
▶︎
Streams Replication Manager Architecture
▶︎
Streams Replication Manager Driver
Connect workers
Connectors
Task architecture and load-balancing
Driver inter-node coordination
▶︎
Streams Replication Manager Service
Remote Querying
Monitoring and metrics
REST API
Replication flows and replication policies
Remote topic discovery
Automatic group offset synchronization
Understanding co-located and external clusters
Understanding Streams Replication Manager properties, their configuration and hierarchy
▶︎
Planning for Streams Replication Manager
Streams Replication Manager requirements
Recommended deployment architecture
▶︎
Configuring Streams Replication Manager
Enable high availability
Enabling prefixless replication
▶︎
Defining and adding clusters for replication
Defining external Kafka clusters
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Adding clusters to Streams Replication Manager's configuration
Configuring replications
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring properties not exposed in Cloudera Manager
Configuring replication specific REST servers
▶︎
Configuring Remote Querying
Enabling Remote Querying
Configuring the advertised information of the Streams Replication Manager Service role
Configuring Streams Replication Manager Driver retry behaviour
Configuring Streams Replication Manager Driver heartbeat emission
Configuring automatic group offset synchronization
Configuring Streams Replication Manager Driver for performance tuning
New topic and consumer group discovery
▶︎
Configuration examples
Bidirectional replication example of two active clusters
Cross data center replication example of multiple clusters
▶︎
Using Streams Replication Manager
▶︎
Streams Replication Manager Command Line Tools
▶︎
srm-control
▶︎
Configuring srm-control
Configuring the Streams Replication Manager client's secure storage
Configuring TLS/SSL properties
Configuring Kerberos properties
Configuring properties for non-Kerberos authentication mechanisms
Setting the secure storage password as an environment variable
Topics and Groups Subcommand
Offsets Subcommand
Monitoring Replication with Streams Messaging Manager
Replicating Data
▶︎
How to Set up Failover and Failback
Configure Streams Replication Manager for Failover and Failback
Migrating Consumer Groups Between Clusters
▶︎
Securing Streams Replication Manager
Security overview
Enabling TLS/SSL for the Streams Replication Manager service
Enabling Kerberos for the Streams Replication Manager service
▶︎
Configuring Basic Authentication for the Streams Replication Manager service
Enabling Basic Authentication for the Streams Replication Manager service
Configuring Basic Authentication for Remote Querying
Streams Replication Manager security example
▶︎
Use cases for Streams Replication Manager in Cloudera on cloud
Using Streams Replication Manager in Cloudera on cloud overview
Replicating data from on premises to cloud with Streams Replication Manager on premises
Replicating data from on premises to cloud with Streams Replication Manager in the cloud
Replicating data between cloud clusters with Streams Replication Manager in the cloud
▶︎
Streams Replication Manager reference
srm-control Options Reference
Configuration Properties Reference for Properties not Available in Cloudera Manager
Kafka credentials property reference
Streams Replication Manager Service data traffic reference
Streams Replication Manager REST API reference
▶︎
Cruise Control
▶︎
Cruise Control overview
Kafka cluster load balancing using Cruise Control
▶︎
Configuring Cruise Control
Setting capacity estimations and goals
Configuring Metrics Reporter in Cruise Control
Adding self-healing goals to Cruise Control in Cloudera Manager
▶︎
Securing Cruise Control
Enable security for Cruise Control
▶︎
Managing Cruise Control
Rebalancing with Cruise Control
Cruise Control REST API endpoints
Cruise Control REST API reference
▶︎
Cloudera Streaming Analytics
▶︎
Cloudera Streaming Analytics overview
Streaming Analytics in Cloudera
What is Apache Flink?
What is Cloudera SQL Stream Builder?
▶︎
Planning your Cloudera Streaming Analytics deployment
Cloudera Streaming Analytics Cloudera Data Hub cluster definitons
Cloudera Streaming Analytics cluster layout
▶︎
Setting up your Cloudera Streaming Analytics cluster
Before creating your cluster
Creating your cluster
After creating your cluster
▶︎
Using Cloudera SQL Stream Builder
Getting Started
▶︎
Projects
Creating a project
Navigating in a project
Managing member of a project
▶︎
Source control of a project
Masking information before using source control
Setting the environment for a project
Importing a project
▶︎
Data sources
Adding Kafka Data Source
Adding Catalogs
▶︎
Using auto discovery of services
Setting up the service discovery
Using the service discovery on Streaming SQL Console
▶︎
Connectors
Using connectors with templates
Adding new connectors
Kafka connectors
CDC connectors
JDBC connector
Filesystem connector
Datagen connector
Faker connector
Blackhole connector
▶︎
Data formats
Adding data formats
▶︎
Tables
▶︎
Kafka tables
▶︎
Configuring Kafka tables
Schema Definition tab
Event Time tab
Data Transformations tab
Properties tab
Deserialization tab
Assigning Kafka keys in streaming queries
Performance & Scalability
Creating Webhook tables
Flink SQL tables
Iceberg tables
▶︎
SQL jobs
Creating and naming SQL jobs
Running SQL Stream jobs
▶︎
Configuring SQL job settings
Adjusting logging configuration in Advanced Settings
Configuring YARN queue for SQL jobs
Configuring state backend for Cloudera SQL Stream Builder
▶︎
Managing session for SQL jobs
Executing SQL jobs in production mode
▶︎
Functions
Creating Python User-defined Functions
▶︎
Creating Javascript User-defined Functions
Developing JavaScript functions
Creating Java User-defined functions
Using System Functions
▶︎
Materialized views
Creating Materialized Views
Configuring Retention Time for Materialized Views
Materialized View Pagination
Using Dynamic Materialized View Endpoints
Configuring Materialized View database information
Using Cloudera SQL Stream Builder with Cloudera Data Visualization
▶︎
Widgets
Creating widgets
Choosing data sources
Managing data source jobs
Customizing visualization types
Managing widgets on the Dashboard
Notifications
REST API
▶︎
Monitoring
Collecting diagnostic data
Governance
▶︎
Flink SQL
▶︎
Flink DDL
Managing time in Cloudera SQL Stream Builder
Flink DML
Flink Queries
Other supported statements
Data Types
Dynamic SQL Hints
SQL Examples
▶︎
Data Enrichment
Joining streaming and bounded tables
Example: joining Kafka and Kudu tables
Updating SQL queries with PROCTIME function
▶︎
Using Apache Flink
Running a simple Flink application
▶︎
Application development
▶︎
Flink application structure
Source, operator and sink in DataStream API
Flink application example
Testing and validating Flink applications
Flink Project Template
▶︎
Configuring Flink applications
Setting parallelism and max parallelism
Configuring Flink application resources
Configuring state backend
Enabling checkpoints for Flink applications
Configuring PyFlink applications
▶︎
DataStream connectors
▶︎
HBase sink with Flink
Creating and configuring the HBaseSinkFunction
▶︎
Kafka with Flink
Schema Registry with Flink
Kafka Metrics Reporter
Kudu with Flink
Iceberg with Flink
File systems
▶︎
Job lifecycle
Setting up Python for PyFlink
Running a Flink job
Using Flink CLI
Enabling savepoints for Flink applications
▶︎
Monitoring
Enabling Flink DEBUG logging
Flink Dashboard
Streams Messaging Manager integration
▶︎
SQL and Table API
SQL and Table API supported features
▶︎
DataStream API interoperability
Converting DataStreams to Tables
Converting Tables to DataStreams
Supported data types
▶︎
SQL catalogs for Flink
Hive catalog
Kudu catalog
Schema Registry catalog
▶︎
SQL connectors for Flink
Kafka connector
▶︎
Data types for Kafka connector
JSON format
CSV format
▶︎
Avro format
Supported basic data types
Schema Registry formats
▶︎
SQL Statements in Flink
CREATE Statements
DROP Statements
ALTER Statements
INSERT Statements
SQL Queries in Flink
▶︎
Governance
Atlas entities in Flink metadata collection
Creating Atlas entity type definitions for Flink
Verifying metadata collection
▶︎
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Updating Flink job dependencies
▶︎
Reference
Flink Terminology
Cloudera Flink Tutorials
▶︎
Tutorials
▶︎
Analyzing your data with Kafka
Understand the use case
▶︎
Prepare your environment
Assign resource roles
Create IDBroker mapping
Set workload password
Create your streaming clusters
▶︎
Set Ranger policies
Grant permission for the ATLAS_HOOK topic
Retrieve and upload keytab file
Create Atlas entity type definitions
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with Kudu
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with HBase
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
Cloudera SQL Stream Builder REST API reference
.NET client
A records and round robin DNS
Add Ranger policies
Add Ranger policies
Add the user to predefined Ranger access policies
Adding a new Registry Client
Adding a new schema
Adding and configuring record-enabled Processors
Adding Catalogs
Adding clusters to Streams Replication Manager's configuration
Adding data formats
Adding Kafka Data Source
Adding new connectors
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Adding the user or group to a predefined access policy
Adjusting logging configuration in Advanced Settings
ADLS Sink
After creating your cluster
After creating your cluster
ALTER Statements
Amazon S3 Sink
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Apache Kafka
Apache Kafka overview
Apache NiFi
Apache NiFi Developer's Guide
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi Registry
Apache NiFi Registry System Administrator's Guide
Apache NiFi System Administrator's Guide
Appendix - Schema example
Application development
Assign resource roles
Assign the EnvironmentUser role
Assigning administrator level permissions
Assigning Kafka keys in streaming queries
Assigning selective permissions to user
Atlas entities in Flink metadata collection
Authentication
Authentication
Authentication using OAuth2 with Kerberos
Authorization
Authorization
Authorization example
Authorization model
Authorization workflow
Authorizing Flow Management cluster access
Authorizing users to access Cruise Control in Streams Messaging Manager
Automatic access to new components and fixes without upgrading
Automatic group offset synchronization
Avro format
Basics
Before creating your cluster
Before you begin
Before you begin
Before you begin
Behavioral Changes in Cloudera DataFlow for Data Hub 7.3.1
Behavioral Changes in Cloudera Streaming Analytics
Behavioral Changes in Flow Management
Behavioral Changes in Streams Messaging
Bidirectional replication example of two active clusters
Blackhole connector
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building Cloudera Manager charts with Kafka metrics
Building your data flow
Building your dataflow
Building your dataflow
CDC connectors
Changing Java version in Flow Management cluster
Changing the Java version of Flow Management Data Hub clusters
Channel encryption
Checking out a ReadyFlow
Checking out your flows
Checking prerequisites
Checking prerequisites
Checking prerequisites
Checking producer activity
Checking schema registration
Choosing data sources
Choosing the number of partitions for a topic
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
client.dns.lookup property options for client
Cloudera Data Hub cluster definitions
Cloudera DataFlow for Data Hub
Cloudera exclusive components [Technical Preview]
Cloudera Flink Tutorials
Cloudera Manager integration
Cloudera Streaming Analytics
Cloudera Streaming Analytics Cloudera Data Hub cluster definitons
Cloudera Streaming Analytics cluster layout
Cloudera Streaming Analytics overview
Cluster discovery using DNS records
Cluster discovery using load balancers
Cluster discovery with multiple Apache Kafka clusters
Cluster sizing
CNAME records configuration
Collecting diagnostic data
Command Line Tools
Compatibility policies
Component support in Cloudera DataFlow for Data Hub 7.3.1
Component types and metrics for alert policies
Components supported by partners
Configuration examples
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configurations required to use load balancer with Kerberos enabled
Configurations required to use load balancer with SSL enabled
Configure clients on a producer or consumer level
Configure clients on an application level
Configure each object store processor
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure processor for data source
Configure processor for data target
Configure Streams Replication Manager for Failover and Failback
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your source processor
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Nexus repository allow list
Configuring an SMT chain
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring Basic Authentication for Remote Querying
Configuring Basic Authentication for the Streams Replication Manager service
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring Cruise Control
Configuring data directories for clusters with custom disk configurations
Configuring EBCDICRecordReader
Configuring EOS for source connectors
Configuring Flink application resources
Configuring Flink applications
Configuring Flow Management clusters to hot load custom NARs
Configuring flow.snapshot
Configuring Kafka brokers
Configuring Kafka clients
Configuring Kafka tables
Configuring Kafka ZooKeeper chroot
Configuring Kerberos authentication
Configuring Kerberos properties
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring Materialized View database information
Configuring Metrics Reporter in Cruise Control
Configuring multiple listeners
Configuring Nginx for basic authentication
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring PyFlink applications
Configuring Ranger policies for site-to-site communication
Configuring Remote Querying
Configuring replication specific REST servers
Configuring replications
Configuring Retention Time for Materialized Views
Configuring rolling restart checks
Configuring Schema Registry instance in NiFi
Configuring SPNEGO authentication and trusted proxies
Configuring SQL job settings
Configuring srm-control
Configuring state backend
Configuring state backend for Cloudera SQL Stream Builder
Configuring Streams Messaging Manager
Configuring Streams Messaging Manager for basic authentication
Configuring Streams Messaging Manager to recognize Prometheus's TLS certificate
Configuring Streams Replication Manager
Configuring Streams Replication Manager Driver for performance tuning
Configuring Streams Replication Manager Driver heartbeat emission
Configuring Streams Replication Manager Driver retry behaviour
Configuring the advertised information of the Streams Replication Manager Service role
Configuring the Atlas hook in Kafka
Configuring the client configuration used for rolling restart checks
Configuring the driver role target clusters
Configuring the Kafka Connect Role
Configuring the Schema Registry client
Configuring the service role target cluster
Configuring the Streams Replication Manager client's secure storage
Configuring TLS/SSL client authentication
Configuring TLS/SSL encryption
Configuring TLS/SSL properties
Configuring YARN queue for SQL jobs
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Configuring your truststores
Confirming your data flow success
Confirming your data flow success
Connect workers
Connecting Kafka clients to Cloudera Data Hub provisioned clusters
Connecting Kafka clients to Cloudera Data Hub provisioned clusters
Connecting Kafka clients to Cloudera on cloud clusters
Connecting to Kafka host
Connecting to NiFi Registry with NiFi Toolkit CLI
Connecting to the Kafka cluster using load balancer
Connection to the cluster with configured DNS aliases
Connectors
Connectors
Connectors
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
ConvertFromBytes
Converting DataStreams to Tables
Converting Tables to DataStreams
ConvertToBytes
Create a custom access policy
Create and configure controller services
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create Iceberg target table
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create Solr target collection
CREATE Statements
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your streaming clusters
Creating a custom access policy
Creating a Kafka topic
Creating a Machine User
Creating a notifier
Creating a Parameter Context from a Parameter Group
Creating a project
Creating an alert policy
Creating and configuring a Parameter Provider
Creating and configuring the HBaseSinkFunction
Creating and naming SQL jobs
Creating Atlas entity type definitions for Flink
Creating Controller Services for your data flow
Creating Java User-defined functions
Creating Javascript User-defined Functions
Creating Kafka topic
Creating Machine User
Creating Materialized Views
Creating Python User-defined Functions
Creating the basic Parameter Contexts
Creating TLS truststore
Creating Webhook tables
Creating widgets
Creating your cluster
Creating your cluster
Creating your cluster
Creating your cluster
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control dashboard in Streams Messaging Manager UI
Cruise Control overview
Cruise Control REST API endpoints
CSV format
Customizing visualization types
Data Enrichment
Data formats
Data sources
Data Transformations tab
Data Types
Data types for Kafka connector
Dataflow development best practices
Dataflow management with schema-based routing
Datagen connector
DataStream API interoperability
DataStream connectors
Debezium Db2 Source
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
Defining and adding clusters for replication
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining Schema Registry access policies
Defining your Cloudera Base on premises data flow
Defining your Cloudera on cloud data flow
Delegation token based authentication
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deleting ZooKeeper from Streams Messaging clusters
Deploying a dataflow
Deploying and managing connectors
Deprecation Notices for Cloudera Streaming Analytics
Deprecation notices In Cloudera DataFlow for Data Hub 7.3.1
Deserialization tab
Deserializing and serializing data from and to a Kafka topic
Developing a dataflow
Developing Apache Kafka applications
Developing JavaScript functions
Disabling an alert policy
Disk management
Disk Removal
Disk Replacement
Downloading and viewing predefined dataflows
Downloading the Snowflake JDBC driver JAR file
Driver inter-node coordination
DROP Statements
Dynamic SQL Hints
Edge Management cluster definitions
Edge Management cluster layout
Edge Management [Technical Preview]
Enable authorization in Kafka with Ranger
Enable high availability
Enable Kerberos authentication
Enable or disable authentication with delegation tokens
Enable security for Cruise Control
Enabling an alert policy
Enabling Basic Authentication for the Streams Replication Manager service
Enabling checkpoints for Flink applications
Enabling end-to-end latency monitoring
Enabling Flink DEBUG logging
Enabling interceptors
Enabling Kerberos for the Streams Replication Manager service
Enabling prefixless replication
Enabling Remote Querying
Enabling savepoints for Flink applications
Enabling TLS/SSL for the Streams Replication Manager service
End to end latency use case
Essential metrics to monitor
Event Time tab
Evolve your schema
Evolving a schema
Example for configuring Parameter Context inheritance
Example for using Parameter Providers
Example: joining Kafka and Kudu tables
Examples of interacting with Schema Registry
Executing SQL jobs in production mode
Exporting a flow from NiFi Registry
Exporting and importing schemas
Exporting or importing data flows with NiFi Toolkit CLI
Exporting schemas using Schema Registry API
Exporting/importing a data flow using NiFi Toolkit CLI
Faker connector
Fetching new components and fixes
Fetching Parameters
File descriptor limits
File systems
Filesystem connector
Filesystems
Finding list of brokers
Finding Schema Registry endpoint
Fixed CVEs in Cloudera DataFlow for Data Hub 7.3.1
Fixed CVEs in Flow Management
Fixed Issues in Cloudera DataFlow for Data Hub 7.3.1
Fixed Issues in Cloudera Streaming Analytics
Fixed issues in Edge Management [Technical Preview]
Fixed issues in Flow Management
Fixed Issues in Streams Messaging
Flink application example
Flink application structure
Flink Dashboard
Flink DDL
Flink DML
Flink Project Template
Flink Queries
Flink SQL
Flink SQL tables
Flink Terminology
Flow Management
Flow Management cluster definitions
Flow Management cluster layout
Flow Management overview
Flow persistance providers
Flow persistance providers
Functions
Gather configuration information
Getting Started
Getting started with Apache NiFi
Getting started with Apache NiFi Registry
Getting started with Streams Messaging clusters in Cloudera on cloud
Giving access to your cluster
Giving access to your cluster
Governance
Governance
Governance
Grant permission for the ATLAS_HOOK topic
Granting Machine User access to environment
Groups and fetching
Handling disk failures
Handling large messages
HBase sink with Flink
HDFS Sink
HDFS Stateless Sink
Hive catalog
Hot loading custom NARs
How to Set up Failover and Failback
HTTP SInk
HTTP Source
Iceberg tables
Iceberg with Flink
ID ranges in Schema Registry
Importance of logical types in Avro
Importing a new flow into NiFi Registry
Importing a project
Importing Confluent Schema Registry schemas into Schema Registry
Importing Kafka entities into Atlas
Importing schemas using Schema Registry API
Improving performance in Schema Registry
InfluxDB SInk
Ingesting data into Amazon S3
Ingesting data into Amazon S3 Buckets
Ingesting data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting data into cloud object stores with RAZ authorizations
Ingesting data into Cloudera Data Warehouse using Iceberg table format
Ingesting data into Cloudera Datawarehouse using Iceberg table format
Ingesting data into Cloudera Object Stores with RAZ authorization
Ingesting data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Ingesting Data into HBase
Ingesting Data into HBase in Cloudera on Cloud
Ingesting data into Hive
Ingesting Data into Hive in Cloudera on Cloud
Ingesting data into Kafka
Ingesting Data into Kafka in Cloudera on Cloud
Ingesting data into Kudu
Ingesting data into Kudu in Cloudera on Cloud
Ingesting data into Solr
Ingesting data into Solr in Cloudera on Cloud
INSERT Statements
Installing connectors
Installing Streams Messaging Manager in Cloudera on cloud
Integrating Schema Registry with Atlas
Integrating Schema Registry with Flink and Cloudera SQL Stream Builder
Integrating Schema Registry with Kafka
Integrating Schema Registry with NiFi
Integrating with Schema Registry
Inter-broker security
Introducing streams messaging cluster on Cloudera on cloud
Introduction to Streams Messaging Manager
ISR management
Java client
JBOD
JBOD Disk migration
JBOD setup
JDBC connector
JDBC Sink
JDBC Source
JMS Source
Job lifecycle
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Joining streaming and bounded tables
JSON format
JVM and garbage collection
JWT algorithms
Kafka ACL APIs support in Ranger
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka Connect
Kafka Connect connector configuration security
Kafka Connect log files
Kafka Connect Overview
Kafka Connect property configuration in Cloudera Manager for Prometheus
Kafka Connect REST API security
Kafka Connect Secrets Storage
Kafka Connect tasks
Kafka Connect to Kafka broker security
Kafka Connect worker assignment
Kafka connector
Kafka connectors
Kafka consumers
Kafka credentials property reference
Kafka disaster recovery
Kafka FAQ
Kafka Introduction
Kafka KRaft [Technical Preview]
Kafka Metrics Reporter
Kafka producers
Kafka property configuration in Cloudera Manager for Prometheus
Kafka public APIs
Kafka rack awareness
Kafka security hardening with Zookeeper ACLs
Kafka Streams
Kafka stretch clusters
Kafka tables
Kafka with Flink
kafka-*-perf-test
kafka-cluster
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-features
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Key Features
Known Issues In Cloudera DataFlow for Data Hub 7.3.1
Known Issues in Cloudera Streaming Analytics
Known issues in Edge Management [Technical Preview]
Known issues in Flow Management
Known Issues in Streams Messaging
Kudu catalog
Kudu Sink
Kudu with Flink
LDAP authentication
Leader positions and in-sync replicas
Load balancer in front of Schema Registry instances
Log cleaner
Log4j vulnerabilities
Logs and log segments
Main Use Cases
Manage individual delegation tokens
Management basics
Managing Alert Policies and Notifiers
Managing and monitoring Cruise Control rebalance
Managing and monitoring Kafka Connect
Managing Apache Kafka
Managing Cruise Control
Managing data source jobs
Managing Kafka topics
Managing member of a project
Managing secrets using the REST API
Managing session for SQL jobs
Managing time in Cloudera SQL Stream Builder
Managing topics across multiple Kafka clusters
Managing widgets on the Dashboard
Managing, Deploying and Monitoring Connectors
Masking information before using source control
Materialized View Pagination
Materialized views
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating Consumer Groups Between Clusters
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring
Monitoring
Monitoring and metrics
Monitoring checkpoint latency for cluster replication
Monitoring end to end latency for Kafka topic
Monitoring end to end latency for Kafka topic
Monitoring end-to-end latency
Monitoring Kafka
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications (Streams Replication Manager)
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka clusters
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring lineage information
Monitoring log size information
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from Cloudera on premises to Cloudera on cloud with NiFi site-to-site
Moving data in and out of Snowflake
Moving data out of Snowflake
Moving data using NiFi site-to-site
Moving data with NiFi
MQTT Source
Navigating in a project
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Nginx configuration for Prometheus
Nginx installtion
Nginx proxy configuration over Prometheus
NiFi record-based Processors and Controller Services
Notifications
OAuth2 authentication
Obtain HBase connection details
Obtain Hive connection details
Offsets Subcommand
Operating system requirements
Other supported statements
Overview
Overview
PAM authentication
Parameter overriding
Partitions
Performance & Scalability
Performance comparison between Cloudera Manager and Prometheus
Performance considerations
Performant .NET producer
Planning for Streams Replication Manager
Planning your Cloudera Streaming Analytics deployment
Planning your Edge Management deployment
Planning your Flow Management deployment
Planning your Streams Messaging deployment
Predefined access policies for Schema Registry
Predefined Ranger access policies for NiFi
Predefined Ranger access policies for NiFi Registry
Prepare your environment
Preparing your clusters
Prerequisites for Prometheus configuration
Principal name mapping
Processing mainframe / EBCDIC data in NiFi [Technical Preview]
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Projects
Prometheus configuration for Streams Messaging Manager
Prometheus for Streams Messaging Manager limitations
Prometheus properties configuration
Properties tab
Protocol between consumer and broker
Public key and secret storage
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using NiFi
Querying a schema
Quotas
Rack awareness
Ranger
Ranger integration
Re-encrypting secrets
Reassigning replicas between log directories
Rebalancing partitions
Rebalancing with Cruise Control
Recommendations for client development
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
Records
Reference
Registering and querying a schema for a Kafka topic
Release Notes
Remote Querying
Remote topic discovery
Replicating Data
Replicating data between cloud clusters with Streams Replication Manager in the cloud
Replicating data from on premises to cloud with Streams Replication Manager in the cloud
Replicating data from on premises to cloud with Streams Replication Manager on premises
Replication flows and replication policies
REST API
REST API
Restricting access to Kafka metadata in Zookeeper
Retries
Retrieve and upload keytab file
Retrieving log directory replica assignment information
Rolling restart checks
Rotate the master key/secret
Running a Flink job
Running a simple Flink application
Running SQL Stream jobs
Running your Flink application
Running your Flink application
Running your Flink application
S3 Sink
Scaling down a NiFi cluster
Scaling down Kafka brokers
Scaling down Kafka Connect
Scaling Kafka brokers
Scaling Kafka Connect
Scaling KRaft
Scaling Streams Messaging clusters
Scaling up a NiFi cluster
Scaling up Kafka brokers
Scaling up Kafka Connect
Scaling up or down a NiFi cluster
Scaling your Flow Management cluster
Schema Definition tab
Schema entities
Schema Registry
Schema Registry authentication through OAuth2 JWT tokens
Schema Registry authorization through Ranger access policies
Schema Registry catalog
Schema Registry component architecture
Schema Registry concepts
Schema Registry formats
Schema Registry overview
Schema Registry overview
Schema Registry server configuration
Schema Registry use cases
Schema Registry with Flink
Searching by topic name
Searching Kafka cluster replications by source
Secure Prometheus for Streams Messaging Manager
Securing Apache Kafka
Securing Cruise Control
Securing Kafka Connect
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Security examples
Security examples
Security for Flow Management clusters and users in Cloudera on cloud
Security overview
Set permissions in Ranger
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set workload password
Setting a Schema Registry ID range
Setting capacity estimations and goals
Setting parallelism and max parallelism
Setting schema access strategy in NiFi
Setting the environment for a project
Setting the secure storage password as an environment variable
Setting up authorization policies
Setting up basic authentication with TLS for Prometheus
Setting up Kafka Connect
Setting up mTLS for Prometheus
Setting up Parameter Context inheritance
Setting up Prometheus for Streams Messaging Manager
Setting up Python for PyFlink
Setting up the service discovery
Setting up TLS for Prometheus
Setting up your Cloudera Streaming Analytics cluster
Setting up your Edge Management cluster
Setting up your Flow Management cluster
Setting up your network configuration
Setting up your Streams Messaging cluster
Setting user limits for Kafka
Setting workload password
Settings to avoid data loss
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
SFTP Source
Simple .NET consumer
Simple .Net consumer using Schema Registry
Simple .NET producer
Simple .Net producer using Schema Registry
Simple Java consumer
Simple Java producer
Single Message Transforms
Sizing estimation based on network and disk message throughput
Source control of a project
Source, operator and sink in DataStream API
SQL and Table API
SQL and Table API supported features
SQL catalogs for Flink
SQL connectors for Flink
SQL Examples
SQL jobs
SQL Queries in Flink
SQL Statements in Flink
srm-control
srm-control Options Reference
Start Prometheus
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start your data flow
Start your data flow
Start your data flow
Stateless NiFi Source and Sink
Streaming Analytics in Cloudera
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager integration
Streams Messaging Manager overview
Streams Messaging Manager property configuration in Cloudera Manager for Prometheus
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Command Line Tools
Streams Replication Manager Driver
Streams Replication Manager overview
Streams Replication Manager reference
Streams Replication Manager requirements
Streams Replication Manager security example
Streams Replication Manager Service
Streams Replication Manager Service data traffic reference
Subscribing to a topic
Supported basic data types
Supported data types
Supported NiFi controller services
Supported NiFi extensions
Supported NiFi flow analysis rules [Technical Preview]
Supported NiFi parameter providers
Supported NiFi processors
Supported NiFi Python components [Technical Preview]
Supported NiFi reporting tasks
Switching flow persistance providers
Switching flow persistence providers using NiFi Toolkit CLI
Syslog TCP Source
Syslog UDP Source
System Level Broker Tuning
Tables
Task architecture and load-balancing
Terms and concepts
Testing and validating Flink applications
The downscale operation fails with decommission failed
The Kafka Connect UI
TLS/SSL client authentication
Topics
Topics and Groups Subcommand
Troubleshooting
Troubleshooting Prometheus for Streams Messaging Manager
Tuning Apache Kafka performance
Tutorial: developing and deploying a JDBC Source dataflow
Tutorials
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding co-located and external clusters
Understanding Streams Replication Manager properties, their configuration and hierarchy
Understanding the kafka-run-class Bash Script
Understanding the use case
Unlocking access to Kafka metadata in Zookeeper
Unsupported Cloudera Streaming Analytics features
Unsupported command line tools
Unsupported Edge Management features [Technical Preview]
Unsupported Features in Cloudera DataFlow for Data Hub 7.3.1
Unsupported Flow Management features
Unsupported Streams Messaging features
Updating a notifier
Updating an alert policy
Updating Flink job dependencies
Updating Parameter Context when the external source has changed
Updating Parameter sensitivity
Updating SQL queries with PROCTIME function
Use case architectures
Use case overview
Use cases
Use cases for Streams Replication Manager in Cloudera on cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User authorization
Using Apache Flink
Using Apache NiFi
Using Apache NiFi Registry
Using Apache NiFi Toolkit
Using auto discovery of services
Using Cloudera SQL Stream Builder
Using Cloudera SQL Stream Builder with Cloudera Data Visualization
Using connectors with templates
Using DataFlow Catalog Registry Client
Using Dynamic Materialized View Endpoints
Using Flink CLI
Using Kafka Connect
Using Parameter Context inheritance
Using Parameter Context inheritance to combine Parameters
Using Parameter Providers
Using Schema Registry
Using Streams Messaging Manager
Using Streams Replication Manager
Using Streams Replication Manager in Cloudera on cloud overview
Using System Functions
Using the AvroConverter
Using the Rebalance Wizard in Cruise Control
Using the service discovery on Streaming SQL Console
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying metadata collection
Verifying the setup
Versioning a flow in the Catalog
Viewing data lineage in Apache Atlas
Viewing Kafka cluster replication details
Virtual memory handling
What are Parameter Providers?
What is Apache Flink?
What is Cloudera SQL Stream Builder?
What is NiFi Registry?
What is NiFi?
What is Parameter Context inheritance?
What to do next
What's New in Cloudera DataFlow for Data Hub 7.3.1
What's New in Cloudera Streaming Analytics
What's new in Edge Management [Technical Preview]
What's new in Flow Management with NiFi 1
What's new in Flow Management with NiFi 2 [Technical Preview]
What's New in Streams Messaging
Widgets
Working with your Flow Management cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
Writing data in an unsecured cluster
Writing Kafka data to Ozone with Kafka Connect
zookeeper-security-migration
«
Filter topics
Ingesting data into Hive
▼
Ingesting data into Hive
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
»
Ingesting Data into Hive in Cloudera on Cloud
Ingesting data into Hive
Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in
Cloudera on cloud
.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your
Cloudera
user to utilize the central authentication features
Cloudera
provides and to exchange credentials for AWS or Azure access tokens, you have to map this
Cloudera
user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the
Cloudera Management Console
in your
Cloudera
environment.
Create the Hive target table
Before you can ingest data into Apache Hive in
Cloudera on cloud
, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in
Cloudera on cloud
, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring
ConsumeKafkaRecord_2_0
. If you are moving data from a location other than Kafka, review
Getting Started with Apache NiFi
for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using
PutHive3Streaming
. If you are moving data into another location, review
Getting Started with Apache NiFi
for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0.2