Homepage
/
Cloudera DataFlow for Data Hub
7.2.17
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
«
Filter topics
CDF for Data Hub
▶︎
Release Notes
▶︎
What's new
What's new in Flow Management
What's new in Edge Management [Technical Preview]
What's new in Streams Messaging
What's new in Streaming Analytics
Component support
▶︎
Supported NiFi extensions
Supported NiFi processors
Supported NiFi controller services
Supported NiFi reporting tasks
Components supported by partners
▶︎
Unsupported features
Unsupported Flow Management features
Unsupported Edge Management features [Technical Preview]
Unsupported Streams Messaging features
Unsupported Streaming Analytics features
▶︎
Known issues
Known issues in Flow Management
Known issues in Edge Management [Technical Preview]
Known issues in Streams Messaging
Known issues in Streaming Analytics
▶︎
Fixed issues
Fixed issues in Flow Management
Fixed issues in Streams Messaging
Fixed issues in Streaming Analytics
▶︎
Fixed CVEs
Log4j vulnerabilities
Fixed CVEs in Flow Management
▶︎
Behavioral changes
Behavioral changes in Streams Messaging
Behavioral changes in Streaming Analytics
Behavioral changes in Flow Management
▶︎
Flow Management
▶︎
Flow Management overview
What is NiFi?
What is NiFi Registry?
▶︎
Planning your Flow Management deployment
Flow Management cluster definitions
Flow Management cluster layout
▶︎
Setting up your Flow Management cluster
Checking prerequisites
Creating your cluster
Giving access to your cluster
▶︎
Working with your Flow Management cluster
▶︎
Authorizing Flow Management cluster access
Flow Management security overview
▶︎
User authorization
Assigning administrator level permissions
▶︎
Assigning selective permissions to user
Assign the EnvironmentUser role
Add user to predefined Ranger access policies
Create custom access policy
Authorization example
Predefined Ranger access policies for Apache NiFi
Predefined Ranger access policies for Apache NiFi Registry
▶︎
Scaling your Flow Management cluster
▶︎
Scaling up or down a NiFi cluster
Scaling up a NiFi cluster
Scaling down a NiFi cluster
▶︎
Changing Java version in Flow Management cluster
Changing the Java version of Flow Management Data Hub clusters
▶︎
Fetching new components and fixes
Automatic access to new components and fixes without upgrading
▶︎
Hot loading custom NARs
Configuring Flow Management clusters to hot load custom NARs
▶︎
Using Parameter Context inheritance
What is parameter context inheritance?
▶︎
Example for configuring parameter context inheritance
Creating the basic parameter contexts
Setting up parameter context inheritance
Parameter overriding
▶︎
Using Parameter Providers
What is a parameter provider?
▶︎
Example for using parameter providers
Creating and configuring a parameter provider
Fetching parameters
Creating a parameter context from a parameter group
Updating parameter sensitivity
Updating parameter context when the external source has changed
Using parameter context inheritance to combine parameters
▶︎
Using DataFlow Catalog Registry Client
Creating a machine user
Adding a new Registry Client
Checking out a readyflow
Checking out your flows
Versioning a flow in the Catalog
▶︎
Exporting/importing a data flow using NiFi Toolkit CLI
Overview
Connecting to NiFi Registry with NiFi Toolkit CLI
Exporting a flow from NiFi Registry
Importing a new flow into NiFi Registry
▶︎
Switching flow persistence providers using NiFi Toolkit CLI
Use case overview
Prerequisites
Switching flow persistance providers
▶︎
Moving data with NiFi
▶︎
Ingesting Data into HBase in CDP Public Cloud
▶︎
Ingesting Data into HBase
Understand the use case
Meet the prerequisites
Create the HBase target table
Add Ranger policies
Obtain HBase connection details
Build the data flow
Configure the HBase client service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Hive in CDP Public Cloud
▶︎
Ingesting data into Hive
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Kafka in CDP Public Cloud
▶︎
Ingesting data into Kafka
Understand the use case
Meet the prerequisites
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring end to end latency for Kafka topic
Monitoring your data flow
Next steps
Appendix - Schema example
▶︎
Ingesting data into Kudu in CDP Public Cloud
▶︎
Ingesting data into Kudu
Understand the use case
Meet the prerequisites
Create the Kudu target table
Build the data flow
Configure the Controller Service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify that you can write data to Kudu
Next steps
▶︎
Ingesting data into Solr in CDP Public Cloud
▶︎
Ingesting data into Solr
Understand the use case
Meet the prerequisites
Create Solr target collection
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into CDW using Iceberg table format
▶︎
Ingesting data into CDW using Iceberg table format
Understand the use case
Meet the prerequisites
Create Iceberg target table
Build the data flow
Create and configure controller services
Configure processor for data source
Configure processor for data target
Start the data flow
▶︎
Ingesting data into Amazon S3 Buckets
▶︎
Ingesting data into Amazon S3
Understand the use case
Meet the prerequisites
Build the data flow
Set up AWS for your ingest data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Azure Data Lake Storage
▶︎
Ingesting data into Azure Data Lake Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Google Cloud Storage
▶︎
Ingesting data into Google Cloud Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Viewing data lineage in Apache Atlas
Next steps
▶︎
Ingesting data into cloud object stores with RAZ authorizations
▶︎
Ingesting data into CDP Object Stores with RAZ authorization
Understand the use case
Meet the prerequisites
Build the data flow
Configure each object store processor
Set permissions in Ranger
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Moving data in and out of Snowflake
Pushing data to and moving data from Snowflake using NiFi
▶︎
Moving data out of Snowflake
Before you begin
Downloading the Snowflake JDBC driver JAR file
Adding Snowflake CA certificates to NiFi truststore
Building your data flow
Creating Controller Services for your data flow
Configuring your source processor
Configuring your target processor
Confirming your data flow success
▶︎
Pushing data into Snowflake
Before you begin
Adding Snowflake CA certificates to NiFi truststore
Building your dataflow
Configuring your Controller Services
Configure your source processor
Configuring your target processor
Confirming your data flow success
Next steps
▶︎
Moving data using NiFi site-to-site
▶︎
Moving data from Private Cloud to Public Cloud with NiFi site-to-site
Understanding the use case
Preparing your clusters
Setting up your network configuration
Configuring your truststores
Defining your CDP Public Cloud data flow
Configuring Ranger policies for site-to-site communication
Defining your CDP Private Cloud Base data flow
▶︎
Apache NiFi
Getting started with Apache NiFi
Using Apache NiFi
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi System Administrator Guide
Using Apache NiFi Toolkit
Apache NiFi Developer Guide
Apache NiFi REST API Reference
▶︎
Apache NiFi Registry
Getting started with Apache NiFi Registry
Using Apache NiFi Registry
Apache NiFi Registry System Administrator Guide
Apache NiFi Registry REST API
▶︎
Edge Management [Technical Preview]
▶︎
Planning your Edge Management deployment
Edge Management cluster definitions
Edge Management cluster layout
▶︎
Setting up your Edge Management cluster
Checking prerequisites
Creating your cluster
After creating your cluster
▶︎
Streams Messaging
▶︎
Planning your Streams Messaging deployment
Data Hub cluster definitions
Streams Messaging cluster layout
▶︎
Setting up your Streams Messaging cluster
Checking prerequisites
Creating your cluster
Deleting ZooKeeper from Streams Messaging clusters
Configuring data directories for clusters with custom disk configurations
Giving access to your cluster
▶︎
Connecting Kafka clients to CDP Public Cloud clusters
Connecting Kafka clients to Data Hub provisioned clusters
▶︎
Scaling Streams Messaging clusters
▶︎
Scaling Kafka brokers
Scaling up Kafka brokers
Scaling down Kafka brokers
▶︎
Troubleshooting
The downscale operation fails with decommission failed
▶︎
Scaling Kafka Connect
Scaling up Kafka Connect
Scaling down Kafka Connect
Scaling KRaft
▶︎
Apache Kafka
▶︎
Apache Kafka overview
Kafka Introduction
▶︎
Kafka Architecture
Brokers
Topics
Records
Partitions
Record order and assignment
Logs and log segments
Kafka brokers and Zookeeper
Leader positions and in-sync replicas
Kafka stretch clusters
Kafka disaster recovery
Kafka rack awareness
Kafka KRaft [Technical Preview]
▶︎
Kafka FAQ
Basics
Use cases
▶︎
Configuring Apache Kafka
Operating system requirements
Performance considerations
Quotas
▶︎
JBOD
JBOD setup
JBOD Disk migration
Setting user limits for Kafka
Connecting Kafka clients to Data Hub provisioned clusters
▶︎
Rolling restart checks
Configuring rolling restart checks
Configuring the client configuration used for rolling restart checks
▶︎
Cluster discovery with multiple Apache Kafka clusters
▶︎
Cluster discovery using DNS records
A records and round robin DNS
client.dns.lookup property options for client
CNAME records configuration
Connection to the cluster with configured DNS aliases
▶︎
Cluster discovery using load balancers
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
Connecting to the Kafka cluster using load balancer
Configuring Kafka ZooKeeper chroot
Rack awareness
▶︎
Securing Apache Kafka
▶︎
Channel encryption
Configure Kafka brokers
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Zookeeper TLS/SSL support for Kafka
▶︎
Authentication
▶︎
TLS/SSL client authentication
Configure Kafka brokers
Configure Kafka clients
Principal name mapping
Enable Kerberos authentication
▶︎
Delegation token based authentication
Enable or disable authentication with delegation tokens
Manage individual delegation tokens
Rotate the master key/secret
▶︎
Client authentication using delegation tokens
Configure clients on a producer or consumer level
Configure clients on an application level
▶︎
LDAP authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
PAM authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
OAuth2 authentication
Configuring Kafka brokers
Configuring Kafka clients
▶︎
Authorization
▶︎
Ranger
Enable authorization in Kafka with Ranger
Configure the resource-based Ranger service used for authorization
▶︎
Governance
Importing Kafka entities into Atlas
Configuring the Atlas hook in Kafka
Inter-broker security
Configuring multiple listeners
▶︎
Kafka security hardening with Zookeeper ACLs
Restricting access to Kafka metadata in Zookeeper
Unlocking access to Kafka metadata in Zookeeper
▶︎
Tuning Apache Kafka performance
Handling large messages
▶︎
Cluster sizing
Sizing estimation based on network and disk message throughput
Choosing the number of partitions for a topic
▶︎
Broker Tuning
JVM and garbage collection
Network and I/O threads
ISR management
Log cleaner
▶︎
System Level Broker Tuning
File descriptor limits
Filesystems
Virtual memory handling
Networking parameters
Configure JMX ephemeral ports
Kafka-ZooKeeper performance tuning
▶︎
Managing Apache Kafka
▶︎
Management basics
Broker log management
Record management
Broker garbage log collection and log rotation
Client and broker compatibility across Kafka versions
▶︎
Managing topics across multiple Kafka clusters
Set up MirrorMaker in Cloudera Manager
Settings to avoid data loss
▶︎
Broker migration
Migrate brokers by modifying broker IDs in meta.properties
Use rsync to copy files from one broker to another
▶︎
Disk management
Monitoring
▶︎
Handling disk failures
Disk Replacement
Disk Removal
Reassigning replicas between log directories
Retrieving log directory replica assignment information
▶︎
Metrics
Building Cloudera Manager charts with Kafka metrics
Essential metrics to monitor
▶︎
Command Line Tools
Unsupported command line tools
kafka-topics
kafka-cluster
kafka-configs
kafka-console-producer
kafka-console-consumer
kafka-consumer-groups
kafka-features
▶︎
kafka-reassign-partitions
Tool usage
Reassignment examples
kafka-log-dirs
zookeeper-security-migration
kafka-delegation-tokens
kafka-*-perf-test
Configuring log levels for command line tools
Understanding the kafka-run-class Bash Script
▶︎
Developing Apache Kafka applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Simple .Net consumer using Schema Registry
Simple .Net producer using Schema Registry
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Kafka Connect
Kafka Connect Overview
Kafka Connect Setup
▶︎
Using Kafka Connect
Configuring the Kafka Connect Role
Managing, Deploying and Monitoring Connectors
▶︎
Writing Kafka data to Ozone with Kafka Connect
Writing data in an unsecured cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
Using the AvroConverter
Configuring EOS for source connectors
▶︎
Securing Kafka Connect
▶︎
Kafka Connect to Kafka broker security
Configuring TLS/SSL encryption
Configuring Kerberos authentication
▶︎
Kafka Connect REST API security
▶︎
Authentication
Configuring TLS/SSL client authentication
Configuring SPNEGO authentication and trusted proxies
▶︎
Authorization
Authorization model
Ranger integration
▶︎
Kafka Connect connector configuration security
▶︎
Kafka Connect Secrets Storage
Terms and concepts
Managing secrets using the REST API
Re-encrypting secrets
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring a Nexus repository allow list
▶︎
Single Message Transforms
Configuring an SMT chain
ConvertFromBytes
ConvertToBytes
▶︎
Connectors
Installing connectors
Debezium Db2 Source [Technical preview]
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
HTTP Source
JDBC Source
JMS Source
MQTT Source
SFTP Source
▶︎
Stateless NiFi Source and Sink
Dataflow development best practices
Kafka Connect worker assignment
Kafka Connect log files
Kafka Connect tasks
Developing a dataflow
Deploying a dataflow
Downloading and viewing predefined dataflows
Configuring flow.snapshot
Tutorial: developing and deploying a JDBC Source dataflow
Syslog TCP Source
Syslog UDP Source
ADLS Sink
▶︎
Amazon S3 Sink
Configuration example
▶︎
HDFS Sink
Configuration example for writing data to HDFS
Configuration example for writing data to Ozone FS
HDFS Stateless Sink
HTTP SInk
InfluxDB SInk
JDBC Sink
Kudu Sink
S3 Sink
▶︎
Kafka Connect Connector Reference
HTTP Source properties reference
JDBC Source properties reference
JMS Source properties reference
MQTT Source properties reference
SFTP Source properties reference
Stateless NiFi Source properties reference
Syslog TCP Source properties reference
Syslog UDP Source properties reference
ADLS Sink properties reference
Amazon S3 Sink properties reference
HDFS Sink properties reference
HDFS Stateless Sink properties reference
HTTP Sink properties reference
InfluxDB Sink properties reference
JDBC Sink properties reference
Kudu Sink properties reference
S3 Sink properties reference
Stateless NiFi Sink properties reference
▶︎
Schema Registry
▶︎
Schema Registry overview
▶︎
Schema Registry overview
Examples of interacting with Schema Registry
▶︎
Schema Registry use cases
Registering and querying a schema for a Kafka topic
Deserializing and serializing data from and to a Kafka topic
Dataflow management with schema-based routing
Schema Registry component architecture
▶︎
Schema Registry concepts
Schema entities
Compatibility policies
Importance of logical types in Avro
▶︎
Integrating with Schema Registry
▶︎
Integrating Schema Registry with NiFi
NiFi record-based Processors and Controller Services
Setting the Schema Registry instance in NiFi
Setting schema access strategy in NiFi
Adding and configuring record-enabled Processors
Integrating Schema Registry with Kafka
Integrating Schema Registry with Flink and SSB
Integrating Schema Registry with Atlas
Improving performance in Schema Registry
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
▶︎
Exporting and importing schemas
Exporting schemas using Schema Registry API
Importing schemas using Schema Registry API
▶︎
ID ranges in Schema Registry
Setting a Schema Registry ID range
▶︎
Load balancer in front of Schema Registry instances
Configurations required to use load balancer with Kerberos enabled
Configurations required to use load balancer with SSL enabled
▶︎
Securing Schema Registry
▶︎
Schema Registry authorization through Ranger access policies
Predefined access policies for Schema Registry
Adding the user or group to a predefined access policy
Creating a custom access policy
▶︎
Schema Registry authentication through OAuth2 JWT tokens
JWT algorithms
Public key and secret storage
Authentication using OAuth2 with Kerberos
Schema Registry server configuration
Configuring the Schema Registry client
Schema Registry REST API reference
▶︎
Streams Messaging Manager
▶︎
Streams Messaging Manager overview
Introduction to Streams Messaging Manager
▶︎
Getting started with Streams Messaging clusters in CDP Public Cloud
Introducing streams messaging cluster on CDP Public Cloud
Meet the prerequisites to create streams messaging cluster
Creating Machine User
Granting Machine User access to environment
Creating Kafka topic
▶︎
Create Ranger policies for Machine User account
Create topic policy
Create consumer group policy
▶︎
Produce data to Kafka topic
Setting workload password
Connecting to Kafka host
Configuring LDAP authentication
Producing data to Kafka topic
Consuming data from Kafka topic
▶︎
Use Kerberos authentication
Kerberos authentication using the ticket cache
Kerberos authentication using a keytab
Monitoring Kafka activity in Streams Messaging Manager
▶︎
Use Schema Registry
▶︎
Gather configuration information
Finding list of brokers
Finding Schema Registry endpoint
Creating TLS truststore
Defining Schema Registry access policies
Producing data in Avro format
Checking schema registration
Checking producer activity
Consuming data from Kafka topics using stored schemas
▶︎
Monitor end-to-end latency
Setting up authorization policies
Enabling end-to-end latency monitoring
▶︎
Evolve your schema
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
What to do next
▶︎
Monitoring Kafka clusters
Monitoring Kafka clusters
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring Kafka brokers
Monitoring Kafka consumers
Monitoring log size information
Monitoring lineage information
▶︎
Managing alert policies
Introduction to alert policies in Streams Messaging Manager
Component types and metrics for alert policies
Notifiers
▶︎
Managing alert policies and notifiers in SMM
Creating a notifier
Updating a notifier
Deleting a notifier
Creating an alert policy
Updating an alert policy
Enabling an alert policy
Disabling an alert policy
Deleting an alert policy
▶︎
Managing Kafka topics
Creating a Kafka topic
Modifying a Kafka topic
Deleting a Kafka topic
▶︎
Monitoring end-to-end latency
End to end latency overview
Granularity of metrics for end-to-end latency
Enabling interceptors
Monitoring end to end latency for Kafka topic
End to end latency use case
▶︎
Monitoring Kafka cluster replications using Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Configuring SMM for monitoring Kafka cluster replications
▶︎
Viewing Kafka cluster replication details
Searching Kafka cluster replications by source
Monitoring Kafka cluster replications by quick ranges
Monitoring status of the clusters to be replicated
▶︎
Monitoring topics to be replicated
Searching by topic name
Monitoring throughput for cluster replication
Monitoring replication latency for cluster replication
Monitoring checkpoint latency for cluster replication
Monitoring replication throughput and latency by values
▶︎
Monitoring Kafka Connect using Streams Messaging Manager
Kafka Connect in SMM
Deploying and managing connectors
▶︎
Getting Metrics for Streams Messaging Manager
Cloudera Manager metrics overview
Prometheus metrics overview
▶︎
Prometheus configuration for SMM
Prerequisites for Prometheus configuration
Prometheus properties configuration
SMM property configuration in Cloudera Manager for Prometheus
Kafka property configuration in Cloudera Manager for Prometheus
Kafka Connect property configuration in Cloudera Manager for Prometheus
Start Prometheus
▶︎
Secure Prometheus for SMM
▶︎
Nginx proxy configuration over Prometheus
Nginx installtion
Nginx configuration for Prometheus
▶︎
Setting up TLS for Prometheus
Configuring SMM to recognize Prometheus's TLS certificate
▶︎
Setting up basic authentication with TLS for Prometheus
Configuring Nginx for basic authentication
Configuring SMM for basic authentication
Setting up mTLS for Prometheus
Prometheus for SMM limitations
Troubleshooting Prometheus for SMM
Performance comparison between Cloudera Manager and Prometheus
▶︎
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Verifying the setup
Streams Messaging Manager REST API reference
▶︎
Streams Replication Manager
▶︎
Streams Replication Manager overview
Overview
Key Features
Main Use Cases
▶︎
Use Case Architectures
▶︎
Highly Available Kafka Architectures
Active / Stand-by Architecture
Active / Active Architecture
Cross Data Center Replication
▶︎
Cluster Migration Architectures
On-premise to Cloud and Kafka Version Upgrade
Aggregation for Analytics
▶︎
Streams Replication Manager Architecture
▶︎
Streams Replication Manager Driver
Connect workers
Connectors
Task architecture and load-balancing
Driver inter-node coordination
▶︎
Streams Replication Manager Service
Remote Querying
Monitoring and metrics
REST API
▶︎
Understanding Replication Flows
Replication Flows Overview
Remote Topics
Bidirectional Replication Flows
Fan-in and Fan-out Replication Flows
Automatic group offset synchronization
Understanding co-located and external clusters
Understanding SRM properties, their configuration and hierarchy
▶︎
Planning for Streams Replication Manager
Streams Replication Manager requirements
Recommended deployment architecture
▶︎
Configuring Streams Replication Manager
Enable high availability
▶︎
Defining and adding clusters for replication
Defining external Kafka clusters
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Adding clusters to SRM's configuration
Configuring replications
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring properties not exposed in Cloudera Manager
Configuring replication specific REST servers
▶︎
Configuring Remote Querying
Enabling Remote Querying
Configuring the advertised information of the SRM Service role
Configuring SRM Driver retry behaviour
Configuring SRM Driver heartbeat emission
Configuring automatic group offset synchronization
Configuring SRM Driver for performance tuning
New topic and consumer group discovery
▶︎
Configuration examples
Bidirectional replication example of two active clusters
Cross data center replication example of multiple clusters
▶︎
Using Streams Replication Manager
▶︎
SRM Command Line Tools
▶︎
srm-control
▶︎
Configuring srm-control
Configuring the SRM client's secure storage
Configuring TLS/SSL properties
Configuring Kerberos properties
Configuring properties for non-Kerberos authentication mechanisms
Setting the secure storage password as an environment variable
Topics and Groups Subcommand
Offsets Subcommand
Monitoring Replication with Streams Messaging Manager
Replicating Data
▶︎
How to Set up Failover and Failback
Configure SRM for Failover and Failback
Migrating Consumer Groups Between Clusters
▶︎
Securing Streams Replication Manager
Security overview
Enabling TLS/SSL for the SRM service
Enabling Kerberos for the SRM service
▶︎
Configuring Basic Authentication for the SRM Service
Enabling Basic Authentication for the SRM Service
Configuring Basic Authentication for Remote Querying
SRM security example
▶︎
Use cases for Streams Replication Manager in CDP Public Cloud
Using SRM in CDP Public Cloud overview
Replicating data from PvC Base to Data Hub with on-prem SRM
Replicating data from PvC Base to Data Hub with cloud SRM
Replicate data between Data Hub clusters with cloud SRM
▶︎
Streams Replication Manager reference
srm-control Options Reference
Configuration Properties Reference for Properties not Available in Cloudera Manager
Kafka credentials property reference
SRM Service data traffic reference
Streams Replication Manager REST API reference
▶︎
Cruise Control
▶︎
Cruise Control overview
Kafka cluster load balancing using Cruise Control
▶︎
Configuring Cruise Control
Setting capacity estimations and goals
Configuring Metrics Reporter in Cruise Control
Adding self-healing goals to Cruise Control in Cloudera Manager
▶︎
Securing Cruise Control
Enable security for Cruise Control
▶︎
Managing Cruise Control
Rebalancing with Cruise Control
Cruise Control REST API endpoints
Cruise Control REST API reference
▼
Streaming Analytics
▶︎
Streaming Analytics overview
Streaming Analytics in Cloudera
What is Apache Flink?
What is SQL Stream Builder?
▶︎
Planning your Streaming Analytics deployment
Streaming Analytics Data Hub cluster definitons
Streaming Analytics cluster layout
▶︎
Setting up your Streaming Analytics cluster
Before creating your cluster
Creating your cluster
After creating your cluster
▶︎
Using SQL Stream Builder
Getting Started
▶︎
Projects
Creating a project
Navigating in a project
Managing member of a project
▶︎
Source control of a project
Masking information before using source control
Setting the environment for a project
Importing a project
▶︎
Data sources
Adding Kafka Data Source
Adding Catalogs
▶︎
Using auto discovery of services
Setting up the service discovery
Using the service discovery on Streaming SQL Console
▶︎
Connectors
Using connectors with templates
Adding new connectors
Kafka connectors
CDC connectors
JDBC connector
Filesystem connector
Iceberg connector
Datagen connector
Faker connector
Blackhole connector
▶︎
Data formats
Adding data formats
▶︎
Tables
▶︎
Kafka tables
▶︎
Configuring Kafka tables
Schema Definition tab
Event Time tab
Data Transformations tab
Properties tab
Deserialization tab
Assigning Kafka keys in streaming queries
Performance & Scalability
Creating Webhook tables
Flink SQL tables
Iceberg tables
▶︎
SQL jobs
Creating and naming SQL jobs
Running SQL Stream jobs
▶︎
Configuring SQL job settings
Adjusting logging configuration in Advanced Settings
Configuring YARN queue for SQL jobs
▶︎
Managing session for SQL jobs
Executing SQL jobs in production mode
▶︎
Functions
▶︎
Creating Javascript User-defined Functions
Developing JavaScript functions
Creating Java User-defined functions
Using System Functions
▶︎
Materialized views
Creating Materialized Views
Configuring Retention Time for Materialized Views
Materialized View Pagination
Using Dynamic Materialized View Endpoints
Configuring Materialized View database information
Using SQL Stream Builder with Cloudera Data Visualization
▶︎
Widgets
Creating widgets
Choosing data sources
Managing data source jobs
Customizing visualization types
Managing widgets on the Dashboard
Notifications
REST API
▶︎
Monitoring
Collecting diagnostic data
Governance
▶︎
Flink SQL
▶︎
Flink DDL
Managing time in SSB
Flink DML
Flink Queries
Other supported statements
Data Types
Dynamic SQL Hints
SQL Examples
▶︎
Data Enrichment
Joining streaming and bounded tables
Example: joining Kafka and Kudu tables
▼
Using Apache Flink
Running a simple Flink application
▶︎
Application development
▶︎
Flink application structure
Source, operator and sink in DataStream API
Flink application example
Testing and validating Flink applications
Flink Project Template
▶︎
Configuring Flink applications
Setting parallelism and max parallelism
Configuring Flink application resources
Configuring RocksDB state backend
Enabling checkpoints for Flink applications
▶︎
DataStream connectors
▶︎
HBase sink with Flink
Creating and configuring the HBaseSinkFunction
▶︎
Kafka with Flink
▶︎
Schema Registry with Flink
ClouderaRegistryKafkaSerializationSchema
ClouderaRegistryKafkaDeserializationSchema
Kafka Metrics Reporter
Kudu with Flink
Iceberg with Flink
▶︎
Job lifecycle
Running a Flink job
Using Flink CLI
Enabling savepoints for Flink applications
▼
Monitoring
Enabling Flink DEBUG logging
Flink Dashboard
Streams Messaging Manager integration
▶︎
SQL and Table API
SQL and Table API supported features
▶︎
DataStream API interoperability
Converting DataStreams to Tables
Converting Tables to DataStreams
Supported data types
▶︎
SQL catalogs for Flink
Hive catalog
Kudu catalog
Schema Registry catalog
▶︎
SQL connectors for Flink
Kafka connector
▶︎
Data types for Kafka connector
JSON format
CSV format
▶︎
Avro format
Supported basic data types
Schema Registry formats
▶︎
SQL Statements in Flink
CREATE Statements
DROP Statements
ALTER Statements
INSERT Statements
SQL Queries in Flink
▶︎
Governance
Atlas entities in Flink metadata collection
Creating Atlas entity type definitions for Flink
Verifying metadata collection
▶︎
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Updating Flink job dependencies
▶︎
Reference
Flink Terminology
Cloudera Flink Tutorials
▶︎
Tutorials
▶︎
Analyzing your data with Kafka
Understand the use case
▶︎
Prepare your environment
Assign resource roles
Create IDBroker mapping
Set workload password
Create your streaming clusters
▶︎
Set Ranger policies
Grant permission for the ATLAS_HOOK topic
Retrieve and upload keytab file
Create Atlas entity type definitions
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with Kudu
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with HBase
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
SQL Stream Builder REST API reference
.NET client
A records and round robin DNS
Active / Active Architecture
Active / Stand-by Architecture
Add Ranger policies
Add Ranger policies
Add user to predefined Ranger access policies
Adding a new Registry Client
Adding a new schema
Adding and configuring record-enabled Processors
Adding Catalogs
Adding clusters to SRM's configuration
Adding data formats
Adding Kafka Data Source
Adding new connectors
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Adding the user or group to a predefined access policy
Adjusting logging configuration in Advanced Settings
ADLS Sink
ADLS Sink properties reference
After creating your cluster
After creating your cluster
Aggregation for Analytics
ALTER Statements
Amazon S3 Sink
Amazon S3 Sink properties reference
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Apache Kafka
Apache Kafka overview
Apache NiFi
Apache NiFi Developer Guide
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi Registry
Apache NiFi Registry REST API
Apache NiFi Registry System Administrator Guide
Apache NiFi REST API Reference
Apache NiFi System Administrator Guide
Appendix - Schema example
Application development
Assign resource roles
Assign the EnvironmentUser role
Assigning administrator level permissions
Assigning Kafka keys in streaming queries
Assigning selective permissions to user
Atlas entities in Flink metadata collection
Authentication
Authentication
Authentication using OAuth2 with Kerberos
Authorization
Authorization
Authorization example
Authorization model
Authorizing Flow Management cluster access
Automatic access to new components and fixes without upgrading
Automatic group offset synchronization
Avro format
Basics
Before creating your cluster
Before you begin
Before you begin
Behavioral changes
Behavioral changes in Flow Management
Behavioral changes in Streaming Analytics
Behavioral changes in Streams Messaging
Bidirectional replication example of two active clusters
Bidirectional Replication Flows
Blackhole connector
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building Cloudera Manager charts with Kafka metrics
Building your data flow
Building your dataflow
CDC connectors
CDF for Data Hub
Changing Java version in Flow Management cluster
Changing the Java version of Flow Management Data Hub clusters
Channel encryption
Checking out a readyflow
Checking out your flows
Checking prerequisites
Checking prerequisites
Checking prerequisites
Checking producer activity
Checking schema registration
Choosing data sources
Choosing the number of partitions for a topic
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
client.dns.lookup property options for client
Cloudera Flink Tutorials
Cloudera Manager metrics overview
ClouderaRegistryKafkaDeserializationSchema
ClouderaRegistryKafkaSerializationSchema
Cluster discovery using DNS records
Cluster discovery using load balancers
Cluster discovery with multiple Apache Kafka clusters
Cluster Migration Architectures
Cluster sizing
CNAME records configuration
Collecting diagnostic data
Command Line Tools
Compatibility policies
Component support
Component types and metrics for alert policies
Components supported by partners
Configuration example
Configuration example for writing data to HDFS
Configuration example for writing data to Ozone FS
Configuration examples
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configurations required to use load balancer with Kerberos enabled
Configurations required to use load balancer with SSL enabled
Configure clients on a producer or consumer level
Configure clients on an application level
Configure each object store processor
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure processor for data source
Configure processor for data target
Configure SRM for Failover and Failback
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your source processor
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Nexus repository allow list
Configuring an SMT chain
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring Basic Authentication for Remote Querying
Configuring Basic Authentication for the SRM Service
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring Cruise Control
Configuring data directories for clusters with custom disk configurations
Configuring EOS for source connectors
Configuring Flink application resources
Configuring Flink applications
Configuring Flow Management clusters to hot load custom NARs
Configuring flow.snapshot
Configuring Kafka brokers
Configuring Kafka clients
Configuring Kafka tables
Configuring Kafka ZooKeeper chroot
Configuring Kerberos authentication
Configuring Kerberos properties
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring Materialized View database information
Configuring Metrics Reporter in Cruise Control
Configuring multiple listeners
Configuring Nginx for basic authentication
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring Ranger policies for site-to-site communication
Configuring Remote Querying
Configuring replication specific REST servers
Configuring replications
Configuring Retention Time for Materialized Views
Configuring RocksDB state backend
Configuring rolling restart checks
Configuring SMM for basic authentication
Configuring SMM for monitoring Kafka cluster replications
Configuring SMM to recognize Prometheus's TLS certificate
Configuring SPNEGO authentication and trusted proxies
Configuring SQL job settings
Configuring SRM Driver for performance tuning
Configuring SRM Driver heartbeat emission
Configuring SRM Driver retry behaviour
Configuring srm-control
Configuring Streams Replication Manager
Configuring the advertised information of the SRM Service role
Configuring the Atlas hook in Kafka
Configuring the client configuration used for rolling restart checks
Configuring the driver role target clusters
Configuring the Kafka Connect Role
Configuring the Schema Registry client
Configuring the service role target cluster
Configuring the SRM client's secure storage
Configuring TLS/SSL client authentication
Configuring TLS/SSL encryption
Configuring TLS/SSL properties
Configuring YARN queue for SQL jobs
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Configuring your truststores
Confirming your data flow success
Confirming your data flow success
Connect workers
Connecting Kafka clients to CDP Public Cloud clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting to Kafka host
Connecting to NiFi Registry with NiFi Toolkit CLI
Connecting to the Kafka cluster using load balancer
Connection to the cluster with configured DNS aliases
Connectors
Connectors
Connectors
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
ConvertFromBytes
Converting DataStreams to Tables
Converting Tables to DataStreams
ConvertToBytes
Create and configure controller services
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create custom access policy
Create Iceberg target table
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create Solr target collection
CREATE Statements
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your streaming clusters
Creating a custom access policy
Creating a Kafka topic
Creating a machine user
Creating a notifier
Creating a parameter context from a parameter group
Creating a project
Creating an alert policy
Creating and configuring a parameter provider
Creating and configuring the HBaseSinkFunction
Creating and naming SQL jobs
Creating Atlas entity type definitions for Flink
Creating Controller Services for your data flow
Creating Java User-defined functions
Creating Javascript User-defined Functions
Creating Kafka topic
Creating Machine User
Creating Materialized Views
Creating the basic parameter contexts
Creating TLS truststore
Creating Webhook tables
Creating widgets
Creating your cluster
Creating your cluster
Creating your cluster
Creating your cluster
Cross Data Center Replication
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control overview
Cruise Control REST API endpoints
CSV format
Customizing visualization types
Data Enrichment
Data formats
Data Hub cluster definitions
Data sources
Data Transformations tab
Data Types
Data types for Kafka connector
Dataflow development best practices
Dataflow management with schema-based routing
Datagen connector
DataStream API interoperability
DataStream connectors
Debezium Db2 Source [Technical preview]
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
Defining and adding clusters for replication
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining Schema Registry access policies
Defining your CDP Private Cloud Base data flow
Defining your CDP Public Cloud data flow
Delegation token based authentication
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deleting ZooKeeper from Streams Messaging clusters
Deploying a dataflow
Deploying and managing connectors
Deserialization tab
Deserializing and serializing data from and to a Kafka topic
Developing a dataflow
Developing Apache Kafka applications
Developing JavaScript functions
Disabling an alert policy
Disk management
Disk Removal
Disk Replacement
Downloading and viewing predefined dataflows
Downloading the Snowflake JDBC driver JAR file
Driver inter-node coordination
DROP Statements
Dynamic SQL Hints
Edge Management cluster definitions
Edge Management cluster layout
Edge Management [Technical Preview]
Enable authorization in Kafka with Ranger
Enable high availability
Enable Kerberos authentication
Enable or disable authentication with delegation tokens
Enable security for Cruise Control
Enabling an alert policy
Enabling Basic Authentication for the SRM Service
Enabling checkpoints for Flink applications
Enabling end-to-end latency monitoring
Enabling Flink DEBUG logging
Enabling interceptors
Enabling Kerberos for the SRM service
Enabling Remote Querying
Enabling savepoints for Flink applications
Enabling TLS/SSL for the SRM service
End to end latency overview
End to end latency use case
Essential metrics to monitor
Event Time tab
Evolve your schema
Evolving a schema
Example for configuring parameter context inheritance
Example for using parameter providers
Example: joining Kafka and Kudu tables
Examples of interacting with Schema Registry
Executing SQL jobs in production mode
Exporting a flow from NiFi Registry
Exporting and importing schemas
Exporting schemas using Schema Registry API
Exporting/importing a data flow using NiFi Toolkit CLI
Faker connector
Fan-in and Fan-out Replication Flows
Fetching new components and fixes
Fetching parameters
File descriptor limits
Filesystem connector
Filesystems
Finding list of brokers
Finding Schema Registry endpoint
Fixed CVEs
Fixed CVEs in Flow Management
Fixed issues
Fixed issues in Flow Management
Fixed issues in Streaming Analytics
Fixed issues in Streams Messaging
Flink application example
Flink application structure
Flink Dashboard
Flink DDL
Flink DML
Flink Project Template
Flink Queries
Flink SQL
Flink SQL tables
Flink Terminology
Flow Management
Flow Management cluster definitions
Flow Management cluster layout
Flow Management overview
Flow Management security overview
Functions
Gather configuration information
Getting Metrics for Streams Messaging Manager
Getting Started
Getting started with Apache NiFi
Getting started with Apache NiFi Registry
Getting started with Streams Messaging clusters in CDP Public Cloud
Giving access to your cluster
Giving access to your cluster
Governance
Governance
Governance
Grant permission for the ATLAS_HOOK topic
Granting Machine User access to environment
Granularity of metrics for end-to-end latency
Groups and fetching
Handling disk failures
Handling large messages
HBase sink with Flink
HDFS Sink
HDFS Sink properties reference
HDFS Stateless Sink
HDFS Stateless Sink properties reference
Highly Available Kafka Architectures
Hive catalog
Hot loading custom NARs
How to Set up Failover and Failback
HTTP SInk
HTTP Sink properties reference
HTTP Source
HTTP Source properties reference
Iceberg connector
Iceberg tables
Iceberg with Flink
ID ranges in Schema Registry
Importance of logical types in Avro
Importing a new flow into NiFi Registry
Importing a project
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Importing Kafka entities into Atlas
Importing schemas using Schema Registry API
Improving performance in Schema Registry
InfluxDB SInk
InfluxDB Sink properties reference
Ingesting data into Amazon S3
Ingesting data into Amazon S3 Buckets
Ingesting data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting data into CDP Object Stores with RAZ authorization
Ingesting data into CDW using Iceberg table format
Ingesting data into CDW using Iceberg table format
Ingesting data into cloud object stores with RAZ authorizations
Ingesting data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Ingesting Data into HBase
Ingesting Data into HBase in CDP Public Cloud
Ingesting data into Hive
Ingesting Data into Hive in CDP Public Cloud
Ingesting data into Kafka
Ingesting Data into Kafka in CDP Public Cloud
Ingesting data into Kudu
Ingesting data into Kudu in CDP Public Cloud
Ingesting data into Solr
Ingesting data into Solr in CDP Public Cloud
INSERT Statements
Installing connectors
Integrating Schema Registry with Atlas
Integrating Schema Registry with Flink and SSB
Integrating Schema Registry with Kafka
Integrating Schema Registry with NiFi
Integrating with Schema Registry
Inter-broker security
Introducing streams messaging cluster on CDP Public Cloud
Introduction to alert policies in Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Introduction to Streams Messaging Manager
ISR management
Java client
JBOD
JBOD Disk migration
JBOD setup
JDBC connector
JDBC Sink
JDBC Sink properties reference
JDBC Source
JDBC Source properties reference
JMS Source
JMS Source properties reference
Job lifecycle
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Joining streaming and bounded tables
JSON format
JVM and garbage collection
JWT algorithms
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka Connect
Kafka Connect connector configuration security
Kafka Connect Connector Reference
Kafka Connect in SMM
Kafka Connect log files
Kafka Connect Overview
Kafka Connect property configuration in Cloudera Manager for Prometheus
Kafka Connect REST API security
Kafka Connect Secrets Storage
Kafka Connect Setup
Kafka Connect tasks
Kafka Connect to Kafka broker security
Kafka Connect worker assignment
Kafka connector
Kafka connectors
Kafka consumers
Kafka credentials property reference
Kafka disaster recovery
Kafka FAQ
Kafka Introduction
Kafka KRaft [Technical Preview]
Kafka Metrics Reporter
Kafka producers
Kafka property configuration in Cloudera Manager for Prometheus
Kafka public APIs
Kafka rack awareness
Kafka security hardening with Zookeeper ACLs
Kafka Streams
Kafka stretch clusters
Kafka tables
Kafka with Flink
kafka-*-perf-test
kafka-cluster
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-features
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Key Features
Known issues
Known issues in Edge Management [Technical Preview]
Known issues in Flow Management
Known issues in Streaming Analytics
Known issues in Streams Messaging
Kudu catalog
Kudu Sink
Kudu Sink properties reference
Kudu with Flink
LDAP authentication
Leader positions and in-sync replicas
Load balancer in front of Schema Registry instances
Log cleaner
Log4j vulnerabilities
Logs and log segments
Main Use Cases
Manage individual delegation tokens
Management basics
Managing alert policies
Managing alert policies and notifiers in SMM
Managing Apache Kafka
Managing Cruise Control
Managing data source jobs
Managing Kafka topics
Managing member of a project
Managing secrets using the REST API
Managing session for SQL jobs
Managing time in SSB
Managing topics across multiple Kafka clusters
Managing widgets on the Dashboard
Managing, Deploying and Monitoring Connectors
Masking information before using source control
Materialized View Pagination
Materialized views
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating Consumer Groups Between Clusters
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring
Monitoring
Monitoring and metrics
Monitoring checkpoint latency for cluster replication
Monitoring end to end latency for Kafka topic
Monitoring end to end latency for Kafka topic
Monitoring end-to-end latency
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka cluster replications using Streams Messaging Manager
Monitoring Kafka clusters
Monitoring Kafka clusters
Monitoring Kafka Connect using Streams Messaging Manager
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring lineage information
Monitoring log size information
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from Private Cloud to Public Cloud with NiFi site-to-site
Moving data in and out of Snowflake
Moving data out of Snowflake
Moving data using NiFi site-to-site
Moving data with NiFi
MQTT Source
MQTT Source properties reference
Navigating in a project
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Nginx configuration for Prometheus
Nginx installtion
Nginx proxy configuration over Prometheus
NiFi record-based Processors and Controller Services
Notifications
Notifiers
OAuth2 authentication
Obtain HBase connection details
Obtain Hive connection details
Offsets Subcommand
On-premise to Cloud and Kafka Version Upgrade
Operating system requirements
Other supported statements
Overview
Overview
PAM authentication
Parameter overriding
Partitions
Performance & Scalability
Performance comparison between Cloudera Manager and Prometheus
Performance considerations
Performant .NET producer
Planning for Streams Replication Manager
Planning your Edge Management deployment
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Predefined access policies for Schema Registry
Predefined Ranger access policies for Apache NiFi
Predefined Ranger access policies for Apache NiFi Registry
Prepare your environment
Preparing your clusters
Prerequisites
Prerequisites for Prometheus configuration
Principal name mapping
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Projects
Prometheus configuration for SMM
Prometheus for SMM limitations
Prometheus metrics overview
Prometheus properties configuration
Properties tab
Protocol between consumer and broker
Public key and secret storage
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using NiFi
Querying a schema
Quotas
Rack awareness
Ranger
Ranger integration
Re-encrypting secrets
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Rebalancing with Cruise Control
Recommendations for client development
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
Records
Reference
Registering and querying a schema for a Kafka topic
Release Notes
Remote Querying
Remote Topics
Replicate data between Data Hub clusters with cloud SRM
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication Flows Overview
REST API
REST API
Restricting access to Kafka metadata in Zookeeper
Retries
Retrieve and upload keytab file
Retrieving log directory replica assignment information
Rolling restart checks
Rotate the master key/secret
Running a Flink job
Running a simple Flink application
Running SQL Stream jobs
Running your Flink application
Running your Flink application
Running your Flink application
S3 Sink
S3 Sink properties reference
Scaling down a NiFi cluster
Scaling down Kafka brokers
Scaling down Kafka Connect
Scaling Kafka brokers
Scaling Kafka Connect
Scaling KRaft
Scaling Streams Messaging clusters
Scaling up a NiFi cluster
Scaling up Kafka brokers
Scaling up Kafka Connect
Scaling up or down a NiFi cluster
Scaling your Flow Management cluster
Schema Definition tab
Schema entities
Schema Registry
Schema Registry authentication through OAuth2 JWT tokens
Schema Registry authorization through Ranger access policies
Schema Registry catalog
Schema Registry component architecture
Schema Registry concepts
Schema Registry formats
Schema Registry overview
Schema Registry overview
Schema Registry server configuration
Schema Registry use cases
Schema Registry with Flink
Searching by topic name
Searching Kafka cluster replications by source
Secure Prometheus for SMM
Securing Apache Kafka
Securing Cruise Control
Securing Kafka Connect
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Security examples
Security examples
Security overview
Set permissions in Ranger
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set workload password
Setting a Schema Registry ID range
Setting capacity estimations and goals
Setting parallelism and max parallelism
Setting schema access strategy in NiFi
Setting the environment for a project
Setting the Schema Registry instance in NiFi
Setting the secure storage password as an environment variable
Setting up authorization policies
Setting up basic authentication with TLS for Prometheus
Setting up mTLS for Prometheus
Setting up parameter context inheritance
Setting up the service discovery
Setting up TLS for Prometheus
Setting up your Edge Management cluster
Setting up your Flow Management cluster
Setting up your network configuration
Setting up your Streaming Analytics cluster
Setting up your Streams Messaging cluster
Setting user limits for Kafka
Setting workload password
Settings to avoid data loss
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
SFTP Source
SFTP Source properties reference
Simple .NET consumer
Simple .Net consumer using Schema Registry
Simple .NET producer
Simple .Net producer using Schema Registry
Simple Java consumer
Simple Java producer
Single Message Transforms
Sizing estimation based on network and disk message throughput
SMM property configuration in Cloudera Manager for Prometheus
Source control of a project
Source, operator and sink in DataStream API
SQL and Table API
SQL and Table API supported features
SQL catalogs for Flink
SQL connectors for Flink
SQL Examples
SQL jobs
SQL Queries in Flink
SQL Statements in Flink
SRM Command Line Tools
SRM security example
SRM Service data traffic reference
srm-control
srm-control Options Reference
Start Prometheus
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start your data flow
Start your data flow
Start your data flow
Stateless NiFi Sink properties reference
Stateless NiFi Source and Sink
Stateless NiFi Source properties reference
Streaming Analytics
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics in Cloudera
Streaming Analytics overview
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager integration
Streams Messaging Manager overview
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager overview
Streams Replication Manager reference
Streams Replication Manager requirements
Streams Replication Manager Service
Subscribing to a topic
Supported basic data types
Supported data types
Supported NiFi controller services
Supported NiFi extensions
Supported NiFi processors
Supported NiFi reporting tasks
Switching flow persistance providers
Switching flow persistence providers using NiFi Toolkit CLI
Syslog TCP Source
Syslog TCP Source properties reference
Syslog UDP Source
Syslog UDP Source properties reference
System Level Broker Tuning
Tables
Task architecture and load-balancing
Terms and concepts
Testing and validating Flink applications
The downscale operation fails with decommission failed
TLS/SSL client authentication
Tool usage
Topics
Topics and Groups Subcommand
Troubleshooting
Troubleshooting Prometheus for SMM
Tuning Apache Kafka performance
Tutorial: developing and deploying a JDBC Source dataflow
Tutorials
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding co-located and external clusters
Understanding Replication Flows
Understanding SRM properties, their configuration and hierarchy
Understanding the kafka-run-class Bash Script
Understanding the use case
Unlocking access to Kafka metadata in Zookeeper
Unsupported command line tools
Unsupported Edge Management features [Technical Preview]
Unsupported features
Unsupported Flow Management features
Unsupported Streaming Analytics features
Unsupported Streams Messaging features
Updating a notifier
Updating an alert policy
Updating Flink job dependencies
Updating parameter context when the external source has changed
Updating parameter sensitivity
Use Case Architectures
Use case overview
Use cases
Use cases for Streams Replication Manager in CDP Public Cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User authorization
Using Apache Flink
Using Apache NiFi
Using Apache NiFi Registry
Using Apache NiFi Toolkit
Using auto discovery of services
Using connectors with templates
Using DataFlow Catalog Registry Client
Using Dynamic Materialized View Endpoints
Using Flink CLI
Using Kafka Connect
Using Parameter Context inheritance
Using parameter context inheritance to combine parameters
Using Parameter Providers
Using Schema Registry
Using SQL Stream Builder
Using SQL Stream Builder with Cloudera Data Visualization
Using SRM in CDP Public Cloud overview
Using Streams Replication Manager
Using System Functions
Using the AvroConverter
Using the service discovery on Streaming SQL Console
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying metadata collection
Verifying the setup
Versioning a flow in the Catalog
Viewing data lineage in Apache Atlas
Viewing Kafka cluster replication details
Virtual memory handling
What is a parameter provider?
What is Apache Flink?
What is NiFi Registry?
What is NiFi?
What is parameter context inheritance?
What is SQL Stream Builder?
What to do next
What's new
What's new in Edge Management [Technical Preview]
What's new in Flow Management
What's new in Streaming Analytics
What's new in Streams Messaging
Widgets
Working with your Flow Management cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
Writing data in an unsecured cluster
Writing Kafka data to Ozone with Kafka Connect
zookeeper-security-migration
«
Filter topics
Monitoring
Running a simple Flink application
▶︎
Application development
▶︎
Flink application structure
Source, operator and sink in DataStream API
Flink application example
Testing and validating Flink applications
Flink Project Template
▶︎
Configuring Flink applications
Setting parallelism and max parallelism
Configuring Flink application resources
Configuring RocksDB state backend
Enabling checkpoints for Flink applications
▶︎
DataStream connectors
▶︎
HBase sink with Flink
Creating and configuring the HBaseSinkFunction
▶︎
Kafka with Flink
▶︎
Schema Registry with Flink
ClouderaRegistryKafkaSerializationSchema
ClouderaRegistryKafkaDeserializationSchema
Kafka Metrics Reporter
Kudu with Flink
Iceberg with Flink
▶︎
Job lifecycle
Running a Flink job
Using Flink CLI
Enabling savepoints for Flink applications
▼
Monitoring
Enabling Flink DEBUG logging
Flink Dashboard
Streams Messaging Manager integration
▶︎
SQL and Table API
SQL and Table API supported features
▶︎
DataStream API interoperability
Converting DataStreams to Tables
Converting Tables to DataStreams
Supported data types
▶︎
SQL catalogs for Flink
Hive catalog
Kudu catalog
Schema Registry catalog
▶︎
SQL connectors for Flink
Kafka connector
▶︎
Data types for Kafka connector
JSON format
CSV format
▶︎
Avro format
Supported basic data types
Schema Registry formats
▶︎
SQL Statements in Flink
CREATE Statements
DROP Statements
ALTER Statements
INSERT Statements
SQL Queries in Flink
▶︎
Governance
Atlas entities in Flink metadata collection
Creating Atlas entity type definitions for Flink
Verifying metadata collection
▶︎
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Updating Flink job dependencies
▶︎
Reference
Flink Terminology
Cloudera Flink Tutorials
»
Using Apache Flink
Monitoring
Enabling Flink DEBUG logging
You can review the log text files of the Flink jobs when an error is detected during the processes. When you set the log level of Flink to DEBUG, you can easily trace the log file for errors.
Flink Dashboard
The Flink Dashboard is a built-in monitoring interface for Flink applications in Cloudera Streaming Analytics. You can monitor your running, completed and stopped Flink jobs on the dashboard. You reach the Flink Dashboard through Cloudera Manager.
Streams Messaging Manager integration
You can use the Streams Messaging Manager (SMM) UI to monitor end-to-end latency of your Flink application when using Kafka as a datastream connector.
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0.2