Homepage
/
Cloudera DataFlow for Data Hub
7.2.15
(Public Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▼
Cloudera Public Cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
▶︎
Cloudera Private Cloud
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Base
Getting Started
Runtime
Upgrade
Storage
Flow Management
Streaming Analytics
Flow Management Operator
Streaming Analytics Operator
Streams Messaging Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime
«
Filter topics
CDF for Data Hub
▶︎
Release Notes
▶︎
What's New in Cloudera DataFlow for Data Hub 7.2.15
What's New in Flow Management
What's New in Streams Messaging
What's New in Streaming Analytics
Component Support in Cloudera DataFlow for Data Hub 7.2.15
▶︎
Supported NiFi Extensions
Supported NiFi Processors
Supported NiFi Controller Services
Supported NiFi Reporting Tasks
Components Supported by Partners
▶︎
Unsupported Features in Cloudera DataFlow for Data Hub 7.2.15
Unsupported Flow Management features
Unsupported Streams Messaging features
Unsupported Streaming Analytics features
▶︎
Known Issues In Cloudera DataFlow for Data Hub 7.2.15
Known Issues in Flow Management
Known Issues in Streams Messaging
Known Issues in Streaming Analytics
▶︎
Fixed Issues in Cloudera DataFlow for Data Hub 7.2.15
Fixed Issues in Flow Management
Fixed Issues in Streams Messaging
Fixed Issues in Streaming Analytics
▶︎
Fixed CVEs in Cloudera DataFlow for Data Hub 7.2.15
Log4j vulnerabilities
Fixed CVEs in Flow Management
▶︎
Behavioral Changes in Cloudera DataFlow for Data Hub 7.2.15
Behavioral Changes in Streaming Analytics
▶︎
Flow Management
▶︎
Flow Management overview
What is Apache NiFi
What is Apache NiFi Registry
▶︎
Planning your Flow Management deployment
Deployment scenarios
Flow Management cluster definitions
Flow Management cluster layout
▶︎
Creating your first Flow Management cluster in CDP Public Cloud
▶︎
Creating your first Flow Management cluster
Checking prerequisites
Create your cluster
Give users access to your cluster
Next steps
▶︎
Apache NiFi
▶︎
Getting started with Apache NiFi
Getting started with Apache NiFi
▶︎
Using Apache NiFi
Using Apache NiFi
▶︎
Authorizing Flow Management cluster access in CDP Public Cloud
Security for Flow Management Clusters and Users in CDP Public Cloud
▶︎
User Authorization
Authorization workflow
Assigning administrator level permissions
▶︎
Assigning selective permissions to user
Assign the EnvironmentUser role
Add the user to predefined Ranger access policies
Create a custom access policy
Authorization example
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
▶︎
Moving data using NiFi site-to-site
▶︎
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Understand the use case
Prepare your clusters
Set up your network configuration
Configure your truststores
Define your CDP Public Cloud dataflow
Configure Ranger policies for site-to-site communication
Define your CDP Private Cloud Base dataflow
▶︎
Using Apache NiFi Toolkit
Using Apache NiFi Toolkit
▶︎
Apache NiFi RecordPath Guide
Apache NiFi RecordPath Guide
▶︎
Apache NiFi Expression Language Guide
Apache NiFi Expression Language Guide
▶︎
Apache NiFi System Administrator Guide
Apache NiFi System Administrator Guide
▶︎
Apache NiFi Developer Guide
Apache NiFi Developer Guide
▶︎
Apache NiFi REST API
Apache NiFi REST API Reference
▶︎
Apache NiFi Registry
▶︎
Getting started with Apache NiFi Registry
Getting started with Apache NiFi Registry
▶︎
Using Apache NiFi Registry
Using Apache NiFi Registry
▶︎
Apache NiFi Registry System Administrator Guide
Apache NiFi Registry System Administrator Guide
▶︎
Apache NiFi Registry REST API
Apache NiFi Registry REST API
▶︎
Operating your Flow Management cluster
▶︎
Scaling your Flow Management cluster
▶︎
Scaling up or down a NiFi cluster
Scaling up a NiFi cluster
Scaling down a NiFi cluster
▶︎
Hot loading custom NARs
Configuring Flow Management clusters to hot load custom NARs
▶︎
Working with flows in Registry using NiFi Toolkit CLI
▶︎
Exporting or importing data flows with NiFi Toolkit CLI
Connecting to NiFi Registry with NiFi Toolkit CLI
Exporting a flow from NiFi Registry
Importing a new flow into NiFi Registry
▶︎
Ingesting data into CDP Public Cloud
▶︎
Ingesting Data into Apache Kafka in CDP Public Cloud
▶︎
Ingesting data into Apache Kafka
Understand the use case
Meet the prerequisites
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring end to end latency for Kafka topic
Monitoring your data flow
Next steps
Appendix - Schema example
▶︎
Ingesting Data into Apache Hive in CDP Public Cloud
▶︎
Ingesting Data into Apache Hive in CDP Public Cloud
Understand the use case
Meet the prerequisites
Configure the service account
Create IDBroker mapping
Create the Hive target table
Add Ranger policies
Obtain Hive connection details
Build the data flow
Configure the controller services
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting Data into Apache HBase in CDP Public Cloud
▶︎
Ingesting Data into Apache HBase in CDP Cloud
Understand the use case
Meet the prerequisites
Create the HBase target table
Add Ranger policies
Obtain HBase connection details
Build the data flow
Configure the HBase client service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify your data flow
Next steps
▶︎
Ingesting data into Apache Kudu in CDP Public Cloud
▶︎
Ingesting Data into Apache Kudu in CDP Public Cloud
Understand the use case
Meet the prerequisites
Create the Kudu target table
Build the data flow
Configure the Controller Service
Configure the processor for your data source
Configure the processor for your data target
Start your data flow
Verify that you can write data to Kudu
Next steps
▶︎
Ingesting data into Apache Solr in CDP Public Cloud
▶︎
Ingesting data into Apache Solr
Understand the use case
Meet the prerequisites
Create Solr target collection
Build the data flow
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Amazon S3 Buckets
▶︎
Ingesting data into Amazon S3
Understand the use case
Meet the prerequisites
Build the data flow
Set up AWS for your ingest data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Azure Data Lake Storage
▶︎
Ingesting data into Azure Data Lake Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Ingesting data into Google Cloud Storage
▶︎
Ingesting data into Google Cloud Storage
Understand the use case
Meet the prerequisites
Build the data flow
Create IDBroker mapping
Create controller services for your data flow
Configure the processor for your data source
Configure the processor for merging records
Configure the processor for your data target
Start the data flow
Verify data flow operation
Monitoring your data flow
Viewing data lineage in Apache Atlas
Next steps
▶︎
Ingesting data into cloud object stores with RAZ authorizations
▶︎
Ingesting data into CDP Object Stores with RAZ authorization
Understand the use case
Meet the prerequisites
Build the data flow
Configure each object store processor
Set permissions in Ranger
Start the data flow
Verify data flow operation
Monitoring your data flow
Next steps
▶︎
Exchanging data with external systems
▶︎
Moving data in and out of Snowflake
Pushing data to and moving data from Snowflake using Apache NiFi
▶︎
Moving data out of Snowflake
Before you begin
Downloading the Snowflake JDBC driver jar file
Adding Snowflake CA certificates to NiFi truststore
Building your dataflow
Creating Controller Services for your dataflow
Configuring your source processor
Configuring your target processor
Confirming your dataflow success
▶︎
Pushing data into Snowflake
Before you begin
Adding Snowflake CA certificates to NiFi truststore
Building your dataflow
Configuring your Controller Services
Configure your source processor
Configuring your target processor
Confirming your dataflow success
Next steps
▶︎
Streams Messaging
▶︎
Planning your Streams Messaging deployment
Deployment scenarios
Data Hub cluster definitions
Streams Messaging cluster layout
▶︎
Creating your First Streams Messaging cluster in CDP Public Cloud
▶︎
Creating your first Streams Messaging cluster
Meet the prerequisites
Create your cluster
Configure data directories for clusters with custom disk configurations
Give users access to your cluster
Next steps
▶︎
Connecting Kafka clients to CDP Public Cloud clusters
Connecting Kafka clients to Data Hub provisioned clusters
▶︎
Scaling Streams Messaging clusters
▶︎
Scaling Kafka brokers
Scaling up Kafka brokers
Scaling down Kafka brokers
▶︎
Troubleshooting
The downscale operation fails with decommission failed
▶︎
Scaling Kafka Connect
Scaling up Kafka Connect
Scaling down Kafka Connect
▶︎
Apache Kafka overview
Kafka Introduction
▶︎
Kafka Architecture
Brokers
Topics
Records
Partitions
Record order and assignment
Logs and log segments
Kafka brokers and Zookeeper
Leader positions and in-sync replicas
▶︎
Kafka FAQ
Basics
Use cases
▶︎
Streams Messaging Manager overview
Introduction to Streams Messaging Manager
▶︎
Streams Replication Manager overview
Overview
Key Features
Main Use Cases
▶︎
Use Case Architectures
▶︎
Highly Available Kafka Architectures
Active / Stand-by Architecture
Active / Active Architecture
Cross Data Center Replication
▶︎
Cluster Migration Architectures
On-premise to Cloud and Kafka Version Upgrade
Aggregation for Analytics
▶︎
Streams Replication Manager Architecture
▶︎
Streams Replication Manager Driver
Connect workers
Connectors
Task architecture and load-balancing
Driver inter-node coordination
▶︎
Streams Replication Manager Service
Remote Querying
▶︎
Understanding Replication Flows
Replication Flows Overview
Remote Topics
Bidirectional Replication Flows
Fan-in and Fan-out Replication Flows
Understanding co-located and external clusters
Understanding SRM properties, their configuration and hierarchy
▶︎
Streams Replication Manager reference
srm-control Options Reference
Configuration Properties Reference for Properties not Available in Cloudera Manager
Kafka credentials property reference
SRM Service data traffic reference
Streams Replication Manager REST API reference
▶︎
Apache Kafka
▶︎
Configuring Apache Kafka
Operating system requirements
Performance considerations
Quotas
▶︎
JBOD
JBOD setup
JBOD Disk migration
Setting user limits for Kafka
Connecting Kafka clients to Data Hub provisioned clusters
▶︎
Rolling restart checks
Configuring rolling restart checks
Configuring the client configuration used for rolling restart checks
▶︎
Cluster discovery with multiple Apache Kafka clusters
▶︎
Cluster discovery using DNS records
A records and round robin DNS
client.dns.lookup property options for client
CNAME records configuration
Connection to the cluster with configured DNS aliases
▶︎
Cluster discovery using load balancers
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
Connecting to the Kafka cluster using load balancer
Configuring Kafka ZooKeeper chroot
Rack awareness
▶︎
Securing Apache Kafka
▶︎
Channel encryption
Configure Kafka brokers
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Zookeeper TLS/SSL support for Kafka
▶︎
Authentication
▶︎
TLS/SSL client authentication
Configure Kafka brokers
Configure Kafka clients
Principal name mapping
Enable Kerberos authentication
▶︎
Delegation token based authentication
Enable or disable authentication with delegation tokens
Manage individual delegation tokens
Rotate the master key/secret
▶︎
Client authentication using delegation tokens
Configure clients on a producer or consumer level
Configure clients on an application level
▶︎
LDAP authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
PAM authentication
Configure Kafka brokers
Configure Kafka clients
▶︎
OAuth2 authentication
Configuring Kafka brokers
Configuring Kafka clients
▶︎
Authorization
▶︎
Ranger
Enable authorization in Kafka with Ranger
Configure the resource-based Ranger service used for authorization
▶︎
Governance
Importing Kafka entities into Atlas
Configuring the Atlas hook in Kafka
Inter-broker security
Configuring multiple listeners
▶︎
Kafka security hardening with Zookeeper ACLs
Restricting access to Kafka metadata in Zookeeper
Unlocking access to Kafka metadata in Zookeeper
▶︎
Tuning Apache Kafka performance
Handling large messages
▶︎
Cluster sizing
Sizing estimation based on network and disk message throughput
Choosing the number of partitions for a topic
▶︎
Broker Tuning
JVM and garbage collection
Network and I/O threads
ISR management
Log cleaner
▶︎
System Level Broker Tuning
File descriptor limits
Filesystems
Virtual memory handling
Networking parameters
Configure JMX ephemeral ports
Kafka-ZooKeeper performance tuning
▶︎
Managing Apache Kafka
▶︎
Management basics
Broker log management
Record management
Broker garbage log collection and log rotation
Client and broker compatibility across Kafka versions
▶︎
Managing topics across multiple Kafka clusters
Set up MirrorMaker in Cloudera Manager
Settings to avoid data loss
▶︎
Broker migration
Migrate brokers by modifying broker IDs in meta.properties
Use rsync to copy files from one broker to another
▶︎
Disk management
Monitoring
▶︎
Handling disk failures
Disk Replacement
Disk Removal
Reassigning replicas between log directories
Retrieving log directory replica assignment information
▶︎
Metrics
Building Cloudera Manager charts with Kafka metrics
Essential metrics to monitor
▶︎
Command Line Tools
Unsupported command line tools
kafka-topics
kafka-configs
kafka-console-producer
kafka-console-consumer
kafka-consumer-groups
▶︎
kafka-reassign-partitions
Tool usage
Reassignment examples
kafka-log-dirs
zookeeper-security-migration
kafka-delegation-tokens
kafka-*-perf-test
Configuring log levels for command line tools
Understanding the kafka-run-class Bash Script
▶︎
Developing Apache Kafka applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Simple .Net consumer using Schema Registry
Simple .Net producer using Schema Registry
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Kafka Connect
Kafka Connect Overview
Kafka Connect Setup
▶︎
Using Kafka Connect
Configuring the Kafka Connect Role
Managing, Deploying and Monitoring Connectors
▶︎
Writing Kafka data to Ozone with Kafka Connect
Writing data in an unsecured cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
▶︎
Securing Kafka Connect
▶︎
Kafka Connect to Kafka broker security
Configuring TLS/SSL encryption
Configuring Kerberos authentication
▶︎
Kafka Connect REST API security
▶︎
Authentication
Configuring TLS/SSL client authentication
Configuring SPNEGO authentication and trusted proxies
▶︎
Authorization
Authorization model
Ranger integration
▶︎
Kafka Connect connector configuration security
▶︎
Kafka Connect Secrets Storage
Terms and concepts
Managing secrets using the REST API
Re-encrypting secrets
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring a Nexus repository allow list
▶︎
Connectors
Installing Connectors
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
HTTP Source
JDBC Source
JMS Source
MQTT Source
SFTP Source
▶︎
Stateless NiFi Source and Sink
Dataflow development best practices
Kafka Connect worker assignment
Kafka Connect log files
Kafka Connect tasks
Developing a dataflow
Deploying a dataflow
Downloading and viewing predefined dataflows
Configuring flow.snapshot
Tutorial: developing and deploying a JDBC Source dataflow
Syslog TCP Source
Syslog UDP Source
ADLS Sink
▶︎
Amazon S3 Sink
Configuration example
▶︎
HDFS Sink
Configuration example for writing data to HDFS
Configuration example for writing data to Ozone FS
HTTP SInk
JDBC Sink
Kudu Sink
S3 Sink
▶︎
Kafka Connect Connector Reference
HTTP Source properties reference
JDBC Source properties reference
JMS Source properties reference
MQTT Source properties reference
SFTP Source properties reference
Stateless NiFi Source properties reference
Syslog TCP Source properties reference
Syslog UDP Source properties reference
ADLS Sink properties reference
Amazon S3 Sink properties reference
HDFS Sink properties reference
HTTP Sink properties reference
JDBC Sink properties reference
Kudu Sink properties reference
S3 Sink properties reference
Stateless NiFi Sink properties reference
▶︎
Schema Registry
▶︎
Schema Registry overview
▶︎
Schema Registry overview
Examples of interacting with Schema Registry
▶︎
Schema Registry use cases
Use case 1: Registering and querying a Schema for a Kafka topic
Use case 2: Reading/deserializing and writing/serializing data from and to a Kafka topic
Use case 3: Dataflow management with schema-based routing
Schema Registry component architecture
▶︎
Schema Registry concepts
Schema entities
Compatibility policies
▶︎
Integrating with Schema Registry
▶︎
Integrating with NiFi
Understand the NiFi record based processors and controller services
Set up the HortonworksSchemaRegistry controller service
Adding and configuring record reader and writer controller services
Using record-enabled processors
Integrating Kafka and Schema Registry
Integrating with Flink and SSB
Integrating with Atlas
Improve performance in Schema Registry
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
▶︎
Importing and exporting schemas
Exporting schemas
Importing schemas
▶︎
ID ranges in Schema Registry
Setting a Schema Registry ID range
▶︎
Securing Schema Registry
▶︎
Schema Registry authorization through Ranger access policies
Pre-defined access policies for Schema Registry
Add the user or group to a pre-defined access policy
Create a custom access policy
▶︎
Schema Registry authentication through OAuth2 JWT tokens
JWT algorithms
Public key and secret storage
Authentication using OAuth2 with Kerberos
Schema Registry server configuration
Configuring the Schema Registry client
Schema Registry REST API reference
▶︎
Streams Messaging Manager
▶︎
Getting started with Streams Messaging clusters in CDP Public Cloud
Introducing streams messaging cluster on CDP Public Cloud
Meet the prerequisites to create streams messaging cluster
Creating Machine User
Granting Machine User access to environment
Creating Kafka topic
▶︎
Create Ranger policies for Machine User account
Create topic policy
Create consumer group policy
▶︎
Produce data to Kafka topic
Setting workload password
Connecting to Kafka host
Configuring LDAP authentication
Producing data to Kafka topic
Consuming data from Kafka topic
▶︎
Use Kerberos authentication
Kerberos authentication using the ticket cache
Kerberos authentication using a keytab
Monitoring Kafka activity in Streams Messaging Manager
▶︎
Use Schema Registry
▶︎
Gather configuration information
Finding list of brokers
Finding Schema Registry endpoint
Creating TLS truststore
Defining Schema Registry access policies
Producing data in Avro format
Checking schema registration
Checking producer activity
Consuming data from Kafka topics using stored schemas
▶︎
Monitor end-to-end latency
Setting up authorization policies
Enabling end-to-end latency monitoring
▶︎
Evolve your schema
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
What to do next
▶︎
Monitoring Kafka clusters
Monitoring Kafka clusters
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring Kafka brokers
Monitoring Kafka consumers
Monitoring lineage information
▶︎
Managing alert policies
Introduction to alert policies in Streams Messaging Manager
Component types and metrics for alert policies
Notifiers
▶︎
Managing alert policies and notifiers in SMM
Creating a notifier
Updating a notifier
Deleting a notifier
Creating an alert policy
Updating an alert policy
Enabling an alert policy
Disabling an alert policy
Deleting an alert policy
▶︎
Managing Kafka topics
Creating a Kafka topic
Modifying a Kafka topic
Deleting a Kafka topic
▶︎
Monitoring end-to-end latency
End to end latency overview
Granularity of metrics for end-to-end latency
Enabling interceptors
Monitoring end to end latency for Kafka topic
End to end latency use case
▶︎
Monitoring Kafka cluster replications using Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Configuring SMM for monitoring Kafka cluster replications
▶︎
Viewing Kafka cluster replication details
Searching Kafka cluster replications by source
Monitoring Kafka cluster replications by quick ranges
Monitoring status of the clusters to be replicated
▶︎
Monitoring topics to be replicated
Searching by topic name
Monitoring throughput for cluster replication
Monitoring replication latency for cluster replication
Monitoring checkpoint latency for cluster replication
Monitoring replication throughput and latency by values
▶︎
Monitoring Kafka Connect using Streams Messaging Manager
Introduction to Kafka Connect
Default view of Kafka Connect in the SMM UI
▶︎
Creating a connector using Kafka Connect in SMM
Configuring connector form
Importing connector configuration
Validating connector configuration
Modifying a connector using Kafka Connect in SMM
Deleting a connector using Kafka Connect in SMM
▶︎
Monitoring connectors using Kafka Connect in SMM
Monitoring connector profile using Kafka Connect in SMM
Monitoring connector settings using Kafka Connect in SMM
Monitoring cluster profile using Kafka Connect in SMM
▶︎
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Verifying the setup
Streams Messaging Manager REST API reference
▶︎
Streams Replication Manager
▶︎
Planning for Streams Replication Manager
Streams Replication Manager requirements
Recommended deployment architecture
▶︎
Configuring Streams Replication Manager
Enable high availability
▶︎
Defining and adding clusters for replication
Defining external Kafka clusters
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Adding clusters to SRM's configuration
Configuring replications
Configuring the driver role target clusters
Configuring the service role target cluster
Configuring properties not exposed in Cloudera Manager
Configuring replication specific REST servers
▶︎
Configuring Remote Querying
Enabling Remote Querying
Configuring the advertised information of the SRM Service role
Configuring SRM Driver retry behaviour
Configuring SRM Driver heartbeat emission
Configuring automatic group offset synchronization
New topic and consumer group discovery
▶︎
Configuration examples
Bidirectional replication example of two active clusters
Cross data center replication example of multiple clusters
▶︎
Using Streams Replication Manager
▶︎
SRM Command Line Tools
▶︎
srm-control
▶︎
Configuring srm-control
Configuring the SRM client's secure storage
Configuring TLS/SSL properties
Configuring Kerberos properties
Configuring properties for non-Kerberos authentication mechanisms
Setting the secure storage password as an environment variable
Topics and Groups Subcommand
Offsets Subcommand
Monitoring Replication with Streams Messaging Manager
Replicating Data
▶︎
How to Set up Failover and Failback
Configure SRM for Failover and Failback
Migrating Consumer Groups Between Clusters
▶︎
Securing Streams Replication Manager
Security overview
Enabling TLS/SSL for the SRM service
Enabling Kerberos for the SRM service
▶︎
Configuring Basic Authentication for the SRM Service
Enabling Basic Authentication for the SRM Service
Configuring Basic Authentication for Remote Querying
SRM security example
▶︎
Use cases for Streams Replication Manager in CDP Public Cloud
Using SRM in CDP Public Cloud overview
Replicating data from PvC Base to Data Hub with on-prem SRM
Replicating data from PvC Base to Data Hub with cloud SRM
Replicate data between Data Hub clusters with cloud SRM
▶︎
Cruise Control
▶︎
Cruise Control overview
Kafka cluster load balancing using Cruise Control
▶︎
Configuring Cruise Control
Setting capacity estimations and goals
Configuring Metrics Reporter in Cruise Control
Adding self-healing goals to Cruise Control in Cloudera Manager
▶︎
Securing Cruise Control
Enable security for Cruise Control
▶︎
Managing Cruise Control
Rebalancing with Cruise Control
Cruise Control REST API endpoints
Cruise Control REST API reference
▼
Streaming Analytics
▶︎
Streaming Analytics overview
Streaming Analytics in Cloudera
▶︎
What is Apache Flink?
Core Features of Flink
▶︎
Introduction to SQL Stream Builder
Key features of SSB
Connector support in SSB
SQL Stream Builder architecture
▶︎
Planning your Streaming Analytics deployment
Streaming Analytics deployment scenarios
Streaming Analytics Data Hub cluster definitons
Streaming Analytics cluster layout
▶︎
Creating your first Streaming Analytics Cluster in CDP Public Cloud
▶︎
Creating your first Streaming Analytics cluster
Meet the prerequisites
Create your cluster
Give users access to your cluster
Next steps
▼
Apache Flink
Running a simple Flink application
▶︎
Application development
▶︎
Flink application structure
Source, operator and sink in DataStream API
Flink application example
Testing and validating Flink applications
Flink Project Template
▶︎
Configuring Flink applications
Setting parallelism and max parallelism
Configuring Flink application resources
Configuring RocksDB state backend
Enabling checkpoints for Flink applications
▶︎
DataStream connectors
▶︎
HBase sink with Flink
Creating and configuring the HBaseSinkFunction
▶︎
Kafka with Flink
▶︎
Schema Registry with Flink
ClouderaRegistryKafkaSerializationSchema
ClouderaRegistryKafkaDeserializationSchema
Kafka Metrics Reporter
Kudu with Flink
▼
Job lifecycle
Running a Flink job
Using Flink CLI
Enabling savepoints for Flink applications
▶︎
Monitoring
Flink Dashboard
Streams Messaging Manager integration
Enabling Flink DEBUG logging
▶︎
Governance
Flink metadata collection using Atlas
Atlas entities in Flink metadata collection
▶︎
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Updating Flink job dependencies
▶︎
Analyzing data with Apache Flink
▶︎
Analyzing your data with Kafka
Understand the use case
▶︎
Prepare your environment
Assign resource roles
Create IDBroker mapping
Set workload password
Create your streaming clusters
▶︎
Set Ranger policies
Grant permission for the ATLAS_HOOK topic
Retrieve and upload keytab file
Create Atlas entity type definitions
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with Kudu
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
Analyzing your data with HBase
▶︎
Running your Flink application
Job monitoring with Flink Dashboard
Metadata governance with Atlas
▶︎
SQL Stream Builder
Managing teams in Streaming SQL Console
▶︎
Using the Streaming SQL Console
▶︎
Console Page
Compose Tab
Tables Tab
Functions Tab
History Tab
SQL Jobs Tab
Data Providers Page
Materialized Views Page
▶︎
Registering Data Providers in SSB
Managing registered Data Providers
▶︎
Connectors in SSB
▶︎
Connector support in SSB
Kafka connectors
CDC connectors
JDBC connector
Filesystem connector
Datagen connector
Faker connector
Blackhole connector
▶︎
Managing connectors and data formats
Adding new connectors
Adding data formats
Concept of tables in SSB
▶︎
Job Lifecycle
▶︎
Configuring SQL job settings
Adjusting logging configuration in Advanced Settings
Configuring YARN queue for SQL jobs
Stopping, restarting and editing SQL jobs
Sampling data for a running job
▶︎
Managing session for SQL jobs
Executing SQL jobs in production mode
Using SQL Stream Builder REST API
Creating Input Transforms
▶︎
Creating User Defined Functions
Developing JavaScript functions
Adding Java to the Functions language option
Using System Functions
Monitoring SQL Stream jobs
▶︎
Querying data with SQL Stream Builder
▶︎
Prepare your environment
Assigning resource roles
Creating IDBroker mapping
Setting workload password
Creating Streaming Analytics cluster
Configuring Ranger policies for SSB
Retrieving keytab file
Uploading and unlocking your keytab in SSB
Running a simple SQL job
▶︎
Creating Kafka tables in SSB
Adding Kafka as Data Provider
▶︎
Creating Kafka tables
Creating Kafka tables using Console wizard
Creating Kafka tables using Templates
▶︎
Configuring Kafka tables
Schema tab
Event Time tab
Transformations tab
Properties tab
Deserialization tab
Assigning Kafka keys in streaming queries
Performance & Scalability
▶︎
Creating tables with Flink SQL in SSB
▶︎
Adding catalogs as Data Provider
Adding Schema Registry as Catalog
Adding Kudu as Catalog
Adding Hive as Catalog
Adding Custom Catalogs
Creating Flink tables using Templates
Creating Webhook tables
Running SQL Stream jobs
▶︎
Flink SQL Overview
▶︎
Flink DDL
Managing time in SSB
Flink DML
Flink Queries
Other supported statements
Data Types
Dynamic SQL Hints
SQL Examples
▶︎
Enriching streaming data with join
Joining streaming and bounded tables
Example: joining Kafka and Kudu tables
▶︎
Using Materialized Views in SQL Stream Builder
Introduction to Materialized Views
▶︎
Creating Materialized Views
Configuring Retention Time for Materialized Views
Using Dynamic Materialized View Endpoints
Configuring Materialized View database information
Using SQL Stream Builder with Cloudera Data Visualization
SQL Stream Builder REST API reference
.NET client
A records and round robin DNS
Active / Active Architecture
Active / Stand-by Architecture
Add Ranger policies
Add Ranger policies
Add the user or group to a pre-defined access policy
Add the user to predefined Ranger access policies
Adding a new schema
Adding and configuring record reader and writer controller services
Adding catalogs as Data Provider
Adding clusters to SRM's configuration
Adding Custom Catalogs
Adding data formats
Adding Hive as Catalog
Adding Java to the Functions language option
Adding Kafka as Data Provider
Adding Kudu as Catalog
Adding new connectors
Adding Schema Registry as Catalog
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Adjusting logging configuration in Advanced Settings
ADLS Sink
ADLS Sink properties reference
Aggregation for Analytics
Amazon S3 Sink
Amazon S3 Sink properties reference
Analyzing data with Apache Flink
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Apache Flink
Apache Kafka
Apache Kafka overview
Apache NiFi Developer Guide
Apache NiFi Developer Guide
Apache NiFi Expression Language Guide
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi RecordPath Guide
Apache NiFi Registry REST API
Apache NiFi Registry REST API
Apache NiFi Registry System Administrator Guide
Apache NiFi Registry System Administrator Guide
Apache NiFi REST API
Apache NiFi REST API Reference
Apache NiFi System Administrator Guide
Apache NiFi System Administrator Guide
Appendix - Schema example
Application development
Assign resource roles
Assign the EnvironmentUser role
Assigning administrator level permissions
Assigning Kafka keys in streaming queries
Assigning resource roles
Assigning selective permissions to user
Atlas entities in Flink metadata collection
Authentication
Authentication
Authentication using OAuth2 with Kerberos
Authorization
Authorization
Authorization example
Authorization model
Authorization workflow
Authorizing Flow Management cluster access in CDP Public Cloud
Basics
Before you begin
Before you begin
Behavioral Changes in Cloudera DataFlow for Data Hub 7.2.15
Behavioral Changes in Streaming Analytics
Bidirectional replication example of two active clusters
Bidirectional Replication Flows
Blackhole connector
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building Cloudera Manager charts with Kafka metrics
Building your dataflow
Building your dataflow
CDC connectors
CDF for Data Hub
Channel encryption
Checking prerequisites
Checking producer activity
Checking schema registration
Choosing the number of partitions for a topic
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
client.dns.lookup property options for client
ClouderaRegistryKafkaDeserializationSchema
ClouderaRegistryKafkaSerializationSchema
Cluster discovery using DNS records
Cluster discovery using load balancers
Cluster discovery with multiple Apache Kafka clusters
Cluster Migration Architectures
Cluster sizing
CNAME records configuration
Command Line Tools
Compatibility policies
Component Support in Cloudera DataFlow for Data Hub 7.2.15
Component types and metrics for alert policies
Components Supported by Partners
Compose Tab
Concept of tables in SSB
Configuration example
Configuration example for writing data to HDFS
Configuration example for writing data to Ozone FS
Configuration examples
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configure clients on a producer or consumer level
Configure clients on an application level
Configure data directories for clusters with custom disk configurations
Configure each object store processor
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure Ranger policies for site-to-site communication
Configure SRM for Failover and Failback
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your source processor
Configure your truststores
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Nexus repository allow list
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring Basic Authentication for Remote Querying
Configuring Basic Authentication for the SRM Service
Configuring connector form
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring Cruise Control
Configuring Flink application resources
Configuring Flink applications
Configuring Flow Management clusters to hot load custom NARs
Configuring flow.snapshot
Configuring Kafka brokers
Configuring Kafka clients
Configuring Kafka tables
Configuring Kafka ZooKeeper chroot
Configuring Kerberos authentication
Configuring Kerberos properties
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring Materialized View database information
Configuring Metrics Reporter in Cruise Control
Configuring multiple listeners
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring Ranger policies for SSB
Configuring Remote Querying
Configuring replication specific REST servers
Configuring replications
Configuring Retention Time for Materialized Views
Configuring RocksDB state backend
Configuring rolling restart checks
Configuring SMM for monitoring Kafka cluster replications
Configuring SPNEGO authentication and trusted proxies
Configuring SQL job settings
Configuring SRM Driver heartbeat emission
Configuring SRM Driver retry behaviour
Configuring srm-control
Configuring Streams Replication Manager
Configuring the advertised information of the SRM Service role
Configuring the Atlas hook in Kafka
Configuring the client configuration used for rolling restart checks
Configuring the driver role target clusters
Configuring the Kafka Connect Role
Configuring the Schema Registry client
Configuring the service role target cluster
Configuring the SRM client's secure storage
Configuring TLS/SSL client authentication
Configuring TLS/SSL encryption
Configuring TLS/SSL properties
Configuring YARN queue for SQL jobs
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Confirming your dataflow success
Confirming your dataflow success
Connect workers
Connecting Kafka clients to CDP Public Cloud clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting to Kafka host
Connecting to NiFi Registry with NiFi Toolkit CLI
Connecting to the Kafka cluster using load balancer
Connection to the cluster with configured DNS aliases
Connector support in SSB
Connector support in SSB
Connectors
Connectors
Connectors in SSB
Console Page
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
Core Features of Flink
Create a custom access policy
Create a custom access policy
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create Solr target collection
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your cluster
Create your cluster
Create your cluster
Create your streaming clusters
Creating a connector using Kafka Connect in SMM
Creating a Kafka topic
Creating a notifier
Creating an alert policy
Creating and configuring the HBaseSinkFunction
Creating Controller Services for your dataflow
Creating Flink tables using Templates
Creating IDBroker mapping
Creating Input Transforms
Creating Kafka tables
Creating Kafka tables in SSB
Creating Kafka tables using Console wizard
Creating Kafka tables using Templates
Creating Kafka topic
Creating Machine User
Creating Materialized Views
Creating Streaming Analytics cluster
Creating tables with Flink SQL in SSB
Creating TLS truststore
Creating User Defined Functions
Creating Webhook tables
Creating your first Flow Management cluster
Creating your first Flow Management cluster in CDP Public Cloud
Creating your first Streaming Analytics cluster
Creating your first Streaming Analytics Cluster in CDP Public Cloud
Creating your first Streams Messaging cluster
Creating your First Streams Messaging cluster in CDP Public Cloud
Cross Data Center Replication
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control overview
Cruise Control REST API endpoints
Data Hub cluster definitions
Data Providers Page
Data Types
Dataflow development best practices
Datagen connector
DataStream connectors
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
Default view of Kafka Connect in the SMM UI
Define your CDP Private Cloud Base dataflow
Define your CDP Public Cloud dataflow
Defining and adding clusters for replication
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining Schema Registry access policies
Delegation token based authentication
Deleting a connector using Kafka Connect in SMM
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deploying a dataflow
Deployment scenarios
Deployment scenarios
Deserialization tab
Developing a dataflow
Developing Apache Kafka applications
Developing JavaScript functions
Disabling an alert policy
Disk management
Disk Removal
Disk Replacement
Downloading and viewing predefined dataflows
Downloading the Snowflake JDBC driver jar file
Driver inter-node coordination
Dynamic SQL Hints
Enable authorization in Kafka with Ranger
Enable high availability
Enable Kerberos authentication
Enable or disable authentication with delegation tokens
Enable security for Cruise Control
Enabling an alert policy
Enabling Basic Authentication for the SRM Service
Enabling checkpoints for Flink applications
Enabling end-to-end latency monitoring
Enabling Flink DEBUG logging
Enabling interceptors
Enabling Kerberos for the SRM service
Enabling Remote Querying
Enabling savepoints for Flink applications
Enabling TLS/SSL for the SRM service
End to end latency overview
End to end latency use case
Enriching streaming data with join
Essential metrics to monitor
Event Time tab
Evolve your schema
Evolving a schema
Example: joining Kafka and Kudu tables
Examples of interacting with Schema Registry
Exchanging data with external systems
Executing SQL jobs in production mode
Exporting a flow from NiFi Registry
Exporting or importing data flows with NiFi Toolkit CLI
Exporting schemas
Faker connector
Fan-in and Fan-out Replication Flows
File descriptor limits
Filesystem connector
Filesystems
Finding list of brokers
Finding Schema Registry endpoint
Fixed CVEs in Cloudera DataFlow for Data Hub 7.2.15
Fixed CVEs in Flow Management
Fixed Issues in Cloudera DataFlow for Data Hub 7.2.15
Fixed Issues in Flow Management
Fixed Issues in Streaming Analytics
Fixed Issues in Streams Messaging
Flink application example
Flink application structure
Flink Dashboard
Flink DDL
Flink DML
Flink metadata collection using Atlas
Flink Project Template
Flink Queries
Flink SQL Overview
Flow Management
Flow Management cluster definitions
Flow Management cluster layout
Flow Management overview
Functions Tab
Gather configuration information
Getting started with Apache NiFi
Getting started with Apache NiFi
Getting started with Apache NiFi Registry
Getting started with Apache NiFi Registry
Getting started with Streams Messaging clusters in CDP Public Cloud
Give users access to your cluster
Give users access to your cluster
Give users access to your cluster
Governance
Governance
Grant permission for the ATLAS_HOOK topic
Granting Machine User access to environment
Granularity of metrics for end-to-end latency
Groups and fetching
Handling disk failures
Handling large messages
HBase sink with Flink
HDFS Sink
HDFS Sink properties reference
Highly Available Kafka Architectures
History Tab
Hot loading custom NARs
How to Set up Failover and Failback
HTTP SInk
HTTP Sink properties reference
HTTP Source
HTTP Source properties reference
ID ranges in Schema Registry
Importing a new flow into NiFi Registry
Importing and exporting schemas
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Importing connector configuration
Importing Kafka entities into Atlas
Importing schemas
Improve performance in Schema Registry
Ingesting data into Amazon S3
Ingesting data into Amazon S3 Buckets
Ingesting Data into Apache HBase in CDP Cloud
Ingesting Data into Apache HBase in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting Data into Apache Hive in CDP Public Cloud
Ingesting data into Apache Kafka
Ingesting Data into Apache Kafka in CDP Public Cloud
Ingesting data into Apache Kudu in CDP Public Cloud
Ingesting Data into Apache Kudu in CDP Public Cloud
Ingesting data into Apache Solr
Ingesting data into Apache Solr in CDP Public Cloud
Ingesting data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting data into CDP Object Stores with RAZ authorization
Ingesting data into CDP Public Cloud
Ingesting data into cloud object stores with RAZ authorizations
Ingesting data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Installing Connectors
Integrating Kafka and Schema Registry
Integrating with Atlas
Integrating with Flink and SSB
Integrating with NiFi
Integrating with Schema Registry
Inter-broker security
Introducing streams messaging cluster on CDP Public Cloud
Introduction to alert policies in Streams Messaging Manager
Introduction to Kafka Connect
Introduction to Materialized Views
Introduction to monitoring Kafka cluster replications in SMM
Introduction to SQL Stream Builder
Introduction to Streams Messaging Manager
ISR management
Java client
JBOD
JBOD Disk migration
JBOD setup
JDBC connector
JDBC Sink
JDBC Sink properties reference
JDBC Source
JDBC Source properties reference
JMS Source
JMS Source properties reference
Job lifecycle
Job Lifecycle
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Joining streaming and bounded tables
JVM and garbage collection
JWT algorithms
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka Connect
Kafka Connect connector configuration security
Kafka Connect Connector Reference
Kafka Connect log files
Kafka Connect Overview
Kafka Connect REST API security
Kafka Connect Secrets Storage
Kafka Connect Setup
Kafka Connect tasks
Kafka Connect to Kafka broker security
Kafka Connect worker assignment
Kafka connectors
Kafka consumers
Kafka credentials property reference
Kafka FAQ
Kafka Introduction
Kafka Metrics Reporter
Kafka producers
Kafka public APIs
Kafka security hardening with Zookeeper ACLs
Kafka Streams
Kafka with Flink
kafka-*-perf-test
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Key Features
Key features of SSB
Known Issues In Cloudera DataFlow for Data Hub 7.2.15
Known Issues in Flow Management
Known Issues in Streaming Analytics
Known Issues in Streams Messaging
Kudu Sink
Kudu Sink properties reference
Kudu with Flink
LDAP authentication
Leader positions and in-sync replicas
Log cleaner
Log4j vulnerabilities
Logs and log segments
Main Use Cases
Manage individual delegation tokens
Management basics
Managing alert policies
Managing alert policies and notifiers in SMM
Managing Apache Kafka
Managing connectors and data formats
Managing Cruise Control
Managing Kafka topics
Managing registered Data Providers
Managing secrets using the REST API
Managing session for SQL jobs
Managing teams in Streaming SQL Console
Managing time in SSB
Managing topics across multiple Kafka clusters
Managing, Deploying and Monitoring Connectors
Materialized Views Page
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating Consumer Groups Between Clusters
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Modifying a connector using Kafka Connect in SMM
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring
Monitoring checkpoint latency for cluster replication
Monitoring cluster profile using Kafka Connect in SMM
Monitoring connector profile using Kafka Connect in SMM
Monitoring connector settings using Kafka Connect in SMM
Monitoring connectors using Kafka Connect in SMM
Monitoring end to end latency for Kafka topic
Monitoring end to end latency for Kafka topic
Monitoring end-to-end latency
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka cluster replications using Streams Messaging Manager
Monitoring Kafka clusters
Monitoring Kafka clusters
Monitoring Kafka Connect using Streams Messaging Manager
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring lineage information
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring SQL Stream jobs
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site
Moving data in and out of Snowflake
Moving data out of Snowflake
Moving data using NiFi site-to-site
MQTT Source
MQTT Source properties reference
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Notifiers
OAuth2 authentication
Obtain HBase connection details
Obtain Hive connection details
Offsets Subcommand
On-premise to Cloud and Kafka Version Upgrade
Operating system requirements
Operating your Flow Management cluster
Other supported statements
Overview
PAM authentication
Partitions
Performance & Scalability
Performance considerations
Performant .NET producer
Planning for Streams Replication Manager
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Pre-defined access policies for Schema Registry
Predefined Ranger Access Policies for Apache NiFi
Predefined Ranger Access Policies for Apache NiFi Registry
Prepare your clusters
Prepare your environment
Prepare your environment
Principal name mapping
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Properties tab
Protocol between consumer and broker
Public key and secret storage
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using Apache NiFi
Querying a schema
Querying data with SQL Stream Builder
Quotas
Rack awareness
Ranger
Ranger integration
Re-encrypting secrets
Reassigning replicas between log directories
Reassignment examples
Rebalancing partitions
Rebalancing with Cruise Control
Recommendations for client development
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
Records
Registering Data Providers in SSB
Release Notes
Remote Querying
Remote Topics
Replicate data between Data Hub clusters with cloud SRM
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication Flows Overview
Restricting access to Kafka metadata in Zookeeper
Retries
Retrieve and upload keytab file
Retrieving keytab file
Retrieving log directory replica assignment information
Rolling restart checks
Rotate the master key/secret
Running a Flink job
Running a simple Flink application
Running a simple SQL job
Running SQL Stream jobs
Running your Flink application
Running your Flink application
Running your Flink application
S3 Sink
S3 Sink properties reference
Sampling data for a running job
Scaling down a NiFi cluster
Scaling down Kafka brokers
Scaling down Kafka Connect
Scaling Kafka brokers
Scaling Kafka Connect
Scaling Streams Messaging clusters
Scaling up a NiFi cluster
Scaling up Kafka brokers
Scaling up Kafka Connect
Scaling up or down a NiFi cluster
Scaling your Flow Management cluster
Schema entities
Schema Registry
Schema Registry authentication through OAuth2 JWT tokens
Schema Registry authorization through Ranger access policies
Schema Registry component architecture
Schema Registry concepts
Schema Registry overview
Schema Registry overview
Schema Registry server configuration
Schema Registry use cases
Schema Registry with Flink
Schema tab
Searching by topic name
Searching Kafka cluster replications by source
Securing Apache Kafka
Securing Cruise Control
Securing Kafka Connect
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Security examples
Security examples
Security for Flow Management Clusters and Users in CDP Public Cloud
Security overview
Set permissions in Ranger
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set up the HortonworksSchemaRegistry controller service
Set up your network configuration
Set workload password
Setting a Schema Registry ID range
Setting capacity estimations and goals
Setting parallelism and max parallelism
Setting the secure storage password as an environment variable
Setting up authorization policies
Setting user limits for Kafka
Setting workload password
Setting workload password
Settings to avoid data loss
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
SFTP Source
SFTP Source properties reference
Simple .NET consumer
Simple .Net consumer using Schema Registry
Simple .NET producer
Simple .Net producer using Schema Registry
Simple Java consumer
Simple Java producer
Sizing estimation based on network and disk message throughput
Source, operator and sink in DataStream API
SQL Examples
SQL Jobs Tab
SQL Stream Builder
SQL Stream Builder architecture
SRM Command Line Tools
SRM security example
SRM Service data traffic reference
srm-control
srm-control Options Reference
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start your data flow
Start your data flow
Start your data flow
Stateless NiFi Sink properties reference
Stateless NiFi Source and Sink
Stateless NiFi Source properties reference
Stopping, restarting and editing SQL jobs
Streaming Analytics
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics deployment scenarios
Streaming Analytics in Cloudera
Streaming Analytics overview
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager integration
Streams Messaging Manager overview
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager overview
Streams Replication Manager reference
Streams Replication Manager requirements
Streams Replication Manager Service
Subscribing to a topic
Supported NiFi Controller Services
Supported NiFi Extensions
Supported NiFi Processors
Supported NiFi Reporting Tasks
Syslog TCP Source
Syslog TCP Source properties reference
Syslog UDP Source
Syslog UDP Source properties reference
System Level Broker Tuning
Tables Tab
Task architecture and load-balancing
Terms and concepts
Testing and validating Flink applications
The downscale operation fails with decommission failed
TLS/SSL client authentication
Tool usage
Topics
Topics and Groups Subcommand
Transformations tab
Troubleshooting
Tuning Apache Kafka performance
Tutorial: developing and deploying a JDBC Source dataflow
Understand the NiFi record based processors and controller services
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding co-located and external clusters
Understanding Replication Flows
Understanding SRM properties, their configuration and hierarchy
Understanding the kafka-run-class Bash Script
Unlocking access to Kafka metadata in Zookeeper
Unsupported command line tools
Unsupported Features in Cloudera DataFlow for Data Hub 7.2.15
Unsupported Flow Management features
Unsupported Streaming Analytics features
Unsupported Streams Messaging features
Updating a notifier
Updating an alert policy
Updating Flink job dependencies
Uploading and unlocking your keytab in SSB
Use case 1: Registering and querying a Schema for a Kafka topic
Use case 2: Reading/deserializing and writing/serializing data from and to a Kafka topic
Use case 3: Dataflow management with schema-based routing
Use Case Architectures
Use cases
Use cases for Streams Replication Manager in CDP Public Cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User Authorization
Using Apache NiFi
Using Apache NiFi
Using Apache NiFi Registry
Using Apache NiFi Registry
Using Apache NiFi Toolkit
Using Apache NiFi Toolkit
Using Dynamic Materialized View Endpoints
Using Flink CLI
Using Kafka Connect
Using Materialized Views in SQL Stream Builder
Using record-enabled processors
Using Schema Registry
Using SQL Stream Builder REST API
Using SQL Stream Builder with Cloudera Data Visualization
Using SRM in CDP Public Cloud overview
Using Streams Replication Manager
Using System Functions
Using the Streaming SQL Console
Validating connector configuration
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying the setup
Viewing data lineage in Apache Atlas
Viewing Kafka cluster replication details
Virtual memory handling
What is Apache Flink?
What is Apache NiFi
What is Apache NiFi Registry
What to do next
What's New in Cloudera DataFlow for Data Hub 7.2.15
What's New in Flow Management
What's New in Streaming Analytics
What's New in Streams Messaging
Working with flows in Registry using NiFi Toolkit CLI
Writing data in a Kerberos and TLS/SSL enabled cluster
Writing data in an unsecured cluster
Writing Kafka data to Ozone with Kafka Connect
zookeeper-security-migration
«
Filter topics
Job lifecycle
Running a simple Flink application
▶︎
Application development
▶︎
Flink application structure
Source, operator and sink in DataStream API
Flink application example
Testing and validating Flink applications
Flink Project Template
▶︎
Configuring Flink applications
Setting parallelism and max parallelism
Configuring Flink application resources
Configuring RocksDB state backend
Enabling checkpoints for Flink applications
▶︎
DataStream connectors
▶︎
HBase sink with Flink
Creating and configuring the HBaseSinkFunction
▶︎
Kafka with Flink
▶︎
Schema Registry with Flink
ClouderaRegistryKafkaSerializationSchema
ClouderaRegistryKafkaDeserializationSchema
Kafka Metrics Reporter
Kudu with Flink
▼
Job lifecycle
Running a Flink job
Using Flink CLI
Enabling savepoints for Flink applications
▶︎
Monitoring
Flink Dashboard
Streams Messaging Manager integration
Enabling Flink DEBUG logging
▶︎
Governance
Flink metadata collection using Atlas
Atlas entities in Flink metadata collection
▶︎
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Updating Flink job dependencies
»
Apache Flink
Job lifecycle
Running a Flink job
After developing your application, you can submit your Flink job in YARN per-job or session mode. To submit the Flink job, you need to run the Flink client in the command line including security parameters and other configurations with the run command.
Using Flink CLI
You can use the Flink command line interface to operate, configure and maintain your Flink applications.
Enabling savepoints for Flink applications
Beside checkpointing, you are also able to create a savepoint of your executed Flink jobs. Savepoints are not automatically created, so you need to trigger them in case of upgrade or maintenance. You can also resume your applications from savepoint.
Feedback
We want your opinion
How can we improve this page?
What kind of feedback do you have?
I like something
I have an idea
Something's not working
Back
Submit
OK
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all
7.3.1
7.2
7.2.18
7.2.17
7.2.16
7.2.15
7.2.14
7.2.12
7.2.11
7.2.10
7.2.9
7.2.8
7.2.7
7.2.6
7.2.2
7.2.1
7.2.0
7.1.0
7.0.2