Homepage/Cloudera DataFlow for Data Hub7.2.18

(Public Cloud)

Search Documentation

▶︎Cloudera
1. Reference Architectures
▼Cloudera Public Cloud
▶︎Cloudera Private Cloud
▶︎Cloudera Manager
1. Cloudera Manager
▶︎Applications
▶︎Legacy
▶︎

Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
DataFlow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
Cloudera Manager
CDF for Data Hub
Runtime

«

Filter topics

CDF for Data Hub
▶︎Release Notes
▼Flow Management
▶︎Edge Management [Technical Preview]
1. ▶︎Planning your Edge Management deployment
  1. Edge Management cluster definitions
  2. Edge Management cluster layout
2. ▶︎Setting up your Edge Management cluster
▶︎Streams Messaging
▶︎Streaming Analytics

.NET client
A records and round robin DNS
Add Ranger policies
Add Ranger policies
Add the user to predefined Ranger access policies
Adding a new Registry Client
Adding a new schema
Adding and configuring record-enabled Processors
Adding Catalogs
Adding clusters to SRM's configuration
Adding data formats
Adding Kafka Data Source
Adding new connectors
Adding self-healing goals to Cruise Control in Cloudera Manager
Adding Snowflake CA certificates to NiFi truststore
Adding Snowflake CA certificates to NiFi truststore
Adding the user or group to a predefined access policy
Adjusting logging configuration in Advanced Settings
ADLS Sink
After creating your cluster
After creating your cluster
ALTER Statements
Amazon S3 Sink
Analyzing your data with HBase
Analyzing your data with Kafka
Analyzing your data with Kudu
Apache Kafka
Apache Kafka overview
Apache NiFi
Apache NiFi Developer Guide
Apache NiFi Expression Language Guide
Apache NiFi RecordPath Guide
Apache NiFi Registry
Apache NiFi Registry REST API
Apache NiFi Registry System Administrator's Guide
Apache NiFi REST API Reference
Apache NiFi System Administrator's Guide
Appendix - Schema example
Application development
Assign resource roles
Assign the EnvironmentUser role
Assigning administrator level permissions
Assigning Kafka keys in streaming queries
Assigning selective permissions to user
Atlas entities in Flink metadata collection
Authentication
Authentication
Authentication using OAuth2 with Kerberos
Authorization
Authorization
Authorization example
Authorization model
Authorization workflow
Authorizing Flow Management cluster access
Authorizing users to access Cruise Control in SMM
Automatic access to new components and fixes without upgrading
Automatic group offset synchronization
Avro format
Basics
Before creating your cluster
Before you begin
Before you begin
Before you begin
Behavioral Changes in Cloudera DataFlow for Data Hub 7.2.18
Behavioral Changes in Flow Management
Behavioral Changes in Streaming Analytics
Behavioral Changes in Streams Messaging
Bidirectional replication example of two active clusters
Blackhole connector
Broker garbage log collection and log rotation
Broker log management
Broker migration
Broker Tuning
Brokers
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Build the data flow
Building Cloudera Manager charts with Kafka metrics
Building your data flow
Building your dataflow
Building your dataflow
CDC connectors
CDF for Data Hub
Changing Java version in Flow Management cluster
Changing the Java version of Flow Management Data Hub clusters
Channel encryption
Checking out a ReadyFlow
Checking out your flows
Checking prerequisites
Checking prerequisites
Checking prerequisites
Checking producer activity
Checking schema registration
Choosing data sources
Choosing the number of partitions for a topic
Client and broker compatibility across Kafka versions
Client authentication using delegation tokens
Client examples
Client examples
client.dns.lookup property options for client
Cloudera exclusive components [Technical Preview]
Cloudera Flink Tutorials
Cloudera Manager integration
Cluster discovery using DNS records
Cluster discovery using load balancers
Cluster discovery with multiple Apache Kafka clusters
Cluster sizing
CNAME records configuration
Collecting diagnostic data
Command Line Tools
Compatibility policies
Component Support in Cloudera DataFlow for Data Hub 7.2.18
Component types and metrics for alert policies
Components supported by partners
Configuration examples
Configuration Properties Reference for Properties not Available in Cloudera Manager
Configurations required to use load balancer with Kerberos enabled
Configurations required to use load balancer with SSL enabled
Configure clients on a producer or consumer level
Configure clients on an application level
Configure each object store processor
Configure JMX ephemeral ports
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka brokers
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka clients
Configure Kafka MirrorMaker
Configure processor for data source
Configure processor for data target
Configure SRM for Failover and Failback
Configure the Controller Service
Configure the controller services
Configure the HBase client service
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for merging records
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data source
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the processor for your data target
Configure the resource-based Ranger service used for authorization
Configure the service account
Configure your source processor
Configure Zookeeper TLS/SSL support for Kafka
Configuring a Nexus repository allow list
Configuring an SMT chain
Configuring Apache Kafka
Configuring automatic group offset synchronization
Configuring Basic Authentication for Remote Querying
Configuring Basic Authentication for the SRM Service
Configuring connector JAAS configuration and Kerberos principal overrides
Configuring Cruise Control
Configuring data directories for clusters with custom disk configurations
Configuring EBCDICRecordReader
Configuring EOS for source connectors
Configuring Flink application resources
Configuring Flink applications
Configuring Flow Management clusters to hot load custom NARs
Configuring flow.snapshot
Configuring Kafka brokers
Configuring Kafka clients
Configuring Kafka tables
Configuring Kafka ZooKeeper chroot
Configuring Kerberos authentication
Configuring Kerberos properties
Configuring LDAP authentication
Configuring log levels for command line tools
Configuring Materialized View database information
Configuring Metrics Reporter in Cruise Control
Configuring multiple listeners
Configuring Nginx for basic authentication
Configuring properties for non-Kerberos authentication mechanisms
Configuring properties not exposed in Cloudera Manager
Configuring PyFlink applications
Configuring Ranger policies for site-to-site communication
Configuring Remote Querying
Configuring replication specific REST servers
Configuring replications
Configuring Retention Time for Materialized Views
Configuring rolling restart checks
Configuring Schema Registry instance in NiFi
Configuring SMM for basic authentication
Configuring SMM to recognize Prometheus's TLS certificate
Configuring SPNEGO authentication and trusted proxies
Configuring SQL job settings
Configuring SRM Driver for performance tuning
Configuring SRM Driver heartbeat emission
Configuring SRM Driver retry behaviour
Configuring srm-control
Configuring state backend
Configuring state backend for SSB
Configuring Streams Messaging Manager
Configuring Streams Replication Manager
Configuring the advertised information of the SRM Service role
Configuring the Atlas hook in Kafka
Configuring the client configuration used for rolling restart checks
Configuring the driver role target clusters
Configuring the Kafka Connect Role
Configuring the Schema Registry client
Configuring the service role target cluster
Configuring the SRM client's secure storage
Configuring TLS/SSL client authentication
Configuring TLS/SSL encryption
Configuring TLS/SSL properties
Configuring YARN queue for SQL jobs
Configuring your Controller Services
Configuring your source processor
Configuring your target processor
Configuring your target processor
Configuring your truststores
Confirming your data flow success
Confirming your data flow success
Connect workers
Connecting Kafka clients to CDP Public Cloud clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting Kafka clients to Data Hub provisioned clusters
Connecting to Kafka host
Connecting to NiFi Registry with NiFi Toolkit CLI
Connecting to the Kafka cluster using load balancer
Connection to the cluster with configured DNS aliases
Connectors
Connectors
Connectors
Consuming data from Kafka topic
Consuming data from Kafka topics using stored schemas
ConvertFromBytes
Converting DataStreams to Tables
Converting Tables to DataStreams
ConvertToBytes
Create a custom access policy
Create and configure controller services
Create Atlas entity type definitions
Create consumer group policy
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create controller services for your data flow
Create Iceberg target table
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create IDBroker mapping
Create Ranger policies for Machine User account
Create Solr target collection
CREATE Statements
Create the HBase target table
Create the Hive target table
Create the Kudu target table
Create topic policy
Create your streaming clusters
Creating a custom access policy
Creating a Kafka topic
Creating a Machine User
Creating a notifier
Creating a Parameter Context from a Parameter Group
Creating a project
Creating an alert policy
Creating and configuring a Parameter Provider
Creating and configuring the HBaseSinkFunction
Creating and naming SQL jobs
Creating Atlas entity type definitions for Flink
Creating Controller Services for your data flow
Creating Java User-defined functions
Creating Javascript User-defined Functions
Creating Kafka topic
Creating Machine User
Creating Materialized Views
Creating the basic Parameter Contexts
Creating TLS truststore
Creating Webhook tables
Creating widgets
Creating your cluster
Creating your cluster
Creating your cluster
Creating your cluster
Cross data center replication example of multiple clusters
Cruise Control
Cruise Control dashboard in SMM UI
Cruise Control overview
Cruise Control REST API endpoints
CSV format
Customizing visualization types
Data Enrichment
Data formats
Data Hub cluster definitions
Data sources
Data Transformations tab
Data Types
Data types for Kafka connector
Dataflow development best practices
Dataflow management with schema-based routing
Datagen connector
DataStream API interoperability
DataStream connectors
Debezium Db2 Source
Debezium MySQL Source
Debezium Oracle Source
Debezium PostgreSQL Source
Debezium SQL Server Source
Defining and adding clusters for replication
Defining co-located Kafka clusters using a service dependency
Defining co-located Kafka clusters using Kafka credentials
Defining external Kafka clusters
Defining Schema Registry access policies
Defining your CDP Private Cloud Base data flow
Defining your CDP Public Cloud data flow
Delegation token based authentication
Deleting a Kafka topic
Deleting a notifier
Deleting a schema
Deleting an alert policy
Deleting ZooKeeper from Streams Messaging clusters
Deploying a dataflow
Deploying and managing connectors
Deserialization tab
Deserializing and serializing data from and to a Kafka topic
Developing a dataflow
Developing Apache Kafka applications
Developing JavaScript functions
Disabling an alert policy
Disk management
Disk Removal
Disk Replacement
Downloading and viewing predefined dataflows
Downloading the Snowflake JDBC driver JAR file
Driver inter-node coordination
DROP Statements
Dynamic SQL Hints
Edge Management cluster definitions
Edge Management cluster layout
Edge Management [Technical Preview]
Enable authorization in Kafka with Ranger
Enable high availability
Enable Kerberos authentication
Enable or disable authentication with delegation tokens
Enable security for Cruise Control
Enabling an alert policy
Enabling Basic Authentication for the SRM Service
Enabling checkpoints for Flink applications
Enabling end-to-end latency monitoring
Enabling Flink DEBUG logging
Enabling interceptors
Enabling Kerberos for the SRM service
Enabling prefixless replication
Enabling Remote Querying
Enabling savepoints for Flink applications
Enabling TLS/SSL for the SRM service
End to end latency use case
Essential metrics to monitor
Event Time tab
Evolve your schema
Evolving a schema
Example for configuring Parameter Context inheritance
Example for using Parameter Providers
Example: joining Kafka and Kudu tables
Examples of interacting with Schema Registry
Executing SQL jobs in production mode
Exporting a flow from NiFi Registry
Exporting and importing schemas
Exporting or importing data flows with NiFi Toolkit CLI
Exporting schemas using Schema Registry API
Exporting/importing a data flow using NiFi Toolkit CLI
Faker connector
Fetching new components and fixes
Fetching Parameters
File descriptor limits
File systems
Filesystem connector
Filesystems
Finding list of brokers
Finding Schema Registry endpoint
Fixed CVEs in Cloudera DataFlow for Data Hub 7.2.18
Fixed CVEs in Flow Management
Fixed Issues in Cloudera DataFlow for Data Hub 7.2.18
Fixed issues in Edge Management [Technical Preview]
Fixed issues in Flow Management
Fixed issues in Streaming Analytics
Fixed issues in Streams Messaging
Flink application example
Flink application structure
Flink Dashboard
Flink DDL
Flink DML
Flink Project Template
Flink Queries
Flink SQL
Flink SQL tables
Flink Terminology
Flow Management
Flow Management cluster definitions
Flow Management cluster layout
Flow Management overview
Flow persistance providers
Flow persistance providers
Functions
Gather configuration information
Getting Started
Getting started with Apache NiFi
Getting started with Apache NiFi Registry
Getting started with Streams Messaging clusters in CDP Public Cloud
Giving access to your cluster
Giving access to your cluster
Governance
Governance
Governance
Grant permission for the ATLAS_HOOK topic
Granting Machine User access to environment
Groups and fetching
Handling disk failures
Handling large messages
HBase sink with Flink
HDFS Sink
HDFS Stateless Sink
Hive catalog
Hot loading custom NARs
How to Set up Failover and Failback
HTTP SInk
HTTP Source
Iceberg tables
Iceberg with Flink
ID ranges in Schema Registry
Importance of logical types in Avro
Importing a new flow into NiFi Registry
Importing a project
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Importing Kafka entities into Atlas
Importing schemas using Schema Registry API
Improving performance in Schema Registry
InfluxDB SInk
Ingesting data into Amazon S3
Ingesting data into Amazon S3 Buckets
Ingesting data into Azure Data Lake Storage
Ingesting data into Azure Data Lake Storage
Ingesting data into CDP Object Stores with RAZ authorization
Ingesting data into CDW using Iceberg table format
Ingesting data into CDW using Iceberg table format
Ingesting data into cloud object stores with RAZ authorizations
Ingesting data into Google Cloud Storage
Ingesting data into Google Cloud Storage
Ingesting Data into HBase
Ingesting Data into HBase in CDP Public Cloud
Ingesting data into Hive
Ingesting Data into Hive in CDP Public Cloud
Ingesting data into Kafka
Ingesting Data into Kafka in CDP Public Cloud
Ingesting data into Kudu
Ingesting data into Kudu in CDP Public Cloud
Ingesting data into Solr
Ingesting data into Solr in CDP Public Cloud
INSERT Statements
Installing connectors
Installing SMM in CDP Public Cloud
Integrating Schema Registry with Atlas
Integrating Schema Registry with Flink and SSB
Integrating Schema Registry with Kafka
Integrating Schema Registry with NiFi
Integrating with Schema Registry
Inter-broker security
Introducing streams messaging cluster on CDP Public Cloud
Introduction to Streams Messaging Manager
ISR management
Java client
JBOD
JBOD Disk migration
JBOD setup
JDBC connector
JDBC Sink
JDBC Source
JMS Source
Job lifecycle
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Job monitoring with Flink Dashboard
Joining streaming and bounded tables
JSON format
JVM and garbage collection
JWT algorithms
Kafka ACL APIs support in Ranger
Kafka Architecture
Kafka brokers and Zookeeper
Kafka clients and ZooKeeper
Kafka cluster load balancing using Cruise Control
Kafka Connect
Kafka Connect connector configuration security
Kafka Connect log files
Kafka Connect Overview
Kafka Connect property configuration in Cloudera Manager for Prometheus
Kafka Connect REST API security
Kafka Connect Secrets Storage
Kafka Connect tasks
Kafka Connect to Kafka broker security
Kafka Connect worker assignment
Kafka connector
Kafka connectors
Kafka consumers
Kafka credentials property reference
Kafka disaster recovery
Kafka FAQ
Kafka Introduction
Kafka KRaft [Technical Preview]
Kafka Metrics Reporter
Kafka producers
Kafka property configuration in Cloudera Manager for Prometheus
Kafka public APIs
Kafka rack awareness
Kafka security hardening with Zookeeper ACLs
Kafka Streams
Kafka stretch clusters
Kafka tables
Kafka with Flink
kafka-*-perf-test
kafka-cluster
kafka-configs
kafka-console-consumer
kafka-console-producer
kafka-consumer-groups
kafka-delegation-tokens
kafka-features
kafka-log-dirs
kafka-reassign-partitions
kafka-topics
Kafka-ZooKeeper performance tuning
Kerberos authentication using a keytab
Kerberos authentication using the ticket cache
Key Features
Known Issues In Cloudera DataFlow for Data Hub 7.2.18
Known issues in Edge Management [Technical Preview]
Known issues in Flow Management
Known issues in Streaming Analytics
Known issues in Streams Messaging
Kudu catalog
Kudu Sink
Kudu with Flink
LDAP authentication
Leader positions and in-sync replicas
Load balancer in front of Schema Registry instances
Log cleaner
Log4j vulnerabilities
Logs and log segments
Main Use Cases
Manage individual delegation tokens
Management basics
Managing Alert Policies and Notifiers
Managing and monitoring Cruise Control rebalance
Managing and monitoring Kafka Connect
Managing Apache Kafka
Managing Cruise Control
Managing data source jobs
Managing Kafka topics
Managing member of a project
Managing secrets using the REST API
Managing session for SQL jobs
Managing time in SSB
Managing topics across multiple Kafka clusters
Managing widgets on the Dashboard
Managing, Deploying and Monitoring Connectors
Masking information before using source control
Materialized View Pagination
Materialized views
Maven dependencies in Flink
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites
Meet the prerequisites to create streams messaging cluster
Metadata governance with Atlas
Metadata governance with Atlas
Metadata governance with Atlas
Metrics
Migrate brokers by modifying broker IDs in meta.properties
Migrating Consumer Groups Between Clusters
Migrating Flink jobs
Migrating Flink jobs without state
Migrating stateful Flink jobs
Modifying a Kafka topic
Monitor end-to-end latency
Monitoring
Monitoring
Monitoring
Monitoring and metrics
Monitoring checkpoint latency for cluster replication
Monitoring end to end latency for Kafka topic
Monitoring end to end latency for Kafka topic
Monitoring end-to-end latency
Monitoring Kafka
Monitoring Kafka activity in Streams Messaging Manager
Monitoring Kafka brokers
Monitoring Kafka cluster replications (SRM)
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka clusters
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring lineage information
Monitoring log size information
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring Replication with Streams Messaging Manager
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Monitoring your data flow
Moving data from Private Cloud to Public Cloud with NiFi site-to-site
Moving data in and out of Snowflake
Moving data out of Snowflake
Moving data using NiFi site-to-site
Moving data with NiFi
MQTT Source
Navigating in a project
Network and I/O threads
Networking parameters
New topic and consumer group discovery
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Next steps
Nginx configuration for Prometheus
Nginx installtion
Nginx proxy configuration over Prometheus
NiFi record-based Processors and Controller Services
Notifications
OAuth2 authentication
Obtain HBase connection details
Obtain Hive connection details
Offsets Subcommand
Operating system requirements
Other supported statements
Overview
Overview
PAM authentication
Parameter overriding
Partitions
Performance & Scalability
Performance comparison between Cloudera Manager and Prometheus
Performance considerations
Performant .NET producer
Planning for Streams Replication Manager
Planning your Edge Management deployment
Planning your Flow Management deployment
Planning your Streaming Analytics deployment
Planning your Streams Messaging deployment
Predefined access policies for Schema Registry
Predefined Ranger access policies for NiFi
Predefined Ranger access policies for NiFi Registry
Prepare your environment
Preparing your clusters
Prerequisites for Prometheus configuration
Principal name mapping
Processing mainframe / EBCDIC data in NiFi [Technical Preview]
Produce data to Kafka topic
Producing data in Avro format
Producing data to Kafka topic
Projects
Prometheus configuration for SMM
Prometheus for SMM limitations
Prometheus properties configuration
Properties tab
Protocol between consumer and broker
Public key and secret storage
Pushing data into Snowflake
Pushing data to and moving data from Snowflake using NiFi
Querying a schema
Quotas
Rack awareness
Ranger
Ranger integration
Re-encrypting secrets
Reassigning replicas between log directories
Rebalancing partitions
Rebalancing with Cruise Control
Recommendations for client development
Recommended deployment architecture
Reconfiguring the Kafka consumer
Reconfiguring the Kafka producer
Record management
Record order and assignment
Records
Reference
Registering and querying a schema for a Kafka topic
Release Notes
Remote Querying
Remote topic discovery
Replicate data between Data Hub clusters with cloud SRM
Replicating Data
Replicating data from PvC Base to Data Hub with cloud SRM
Replicating data from PvC Base to Data Hub with on-prem SRM
Replication flows and replication policies
REST API
REST API
Restricting access to Kafka metadata in Zookeeper
Retries
Retrieve and upload keytab file
Retrieving log directory replica assignment information
Rolling restart checks
Rotate the master key/secret
Running a Flink job
Running a simple Flink application
Running SQL Stream jobs
Running your Flink application
Running your Flink application
Running your Flink application
S3 Sink
Scaling down a NiFi cluster
Scaling down Kafka brokers
Scaling down Kafka Connect
Scaling Kafka brokers
Scaling Kafka Connect
Scaling KRaft
Scaling Streams Messaging clusters
Scaling up a NiFi cluster
Scaling up Kafka brokers
Scaling up Kafka Connect
Scaling up or down a NiFi cluster
Scaling your Flow Management cluster
Schema Definition tab
Schema entities
Schema Registry
Schema Registry authentication through OAuth2 JWT tokens
Schema Registry authorization through Ranger access policies
Schema Registry catalog
Schema Registry component architecture
Schema Registry concepts
Schema Registry formats
Schema Registry overview
Schema Registry overview
Schema Registry server configuration
Schema Registry use cases
Schema Registry with Flink
Searching by topic name
Searching Kafka cluster replications by source
Secure Prometheus for SMM
Securing Apache Kafka
Securing Cruise Control
Securing Kafka Connect
Securing Schema Registry
Securing Streams Messaging Manager
Securing Streams Messaging Manager
Securing Streams Replication Manager
Security examples
Security examples
Security for Flow Management Clusters and Users in CDP Public Cloud
Security overview
Set permissions in Ranger
Set Ranger policies
Set up AWS for your ingest data flow
Set up MirrorMaker in Cloudera Manager
Set workload password
Setting a Schema Registry ID range
Setting capacity estimations and goals
Setting parallelism and max parallelism
Setting schema access strategy in NiFi
Setting the environment for a project
Setting the secure storage password as an environment variable
Setting up authorization policies
Setting up basic authentication with TLS for Prometheus
Setting up Kafka Connect
Setting up mTLS for Prometheus
Setting up Parameter Context inheritance
Setting up Prometheus for SMM
Setting up Python for PyFlink
Setting up the service discovery
Setting up TLS for Prometheus
Setting up your Edge Management cluster
Setting up your Flow Management cluster
Setting up your network configuration
Setting up your Streaming Analytics cluster
Setting up your Streams Messaging cluster
Setting user limits for Kafka
Setting workload password
Settings to avoid data loss
Setup for SASL with Kerberos
Setup for TLS/SSL encryption
SFTP Source
Simple .NET consumer
Simple .Net consumer using Schema Registry
Simple .NET producer
Simple .Net producer using Schema Registry
Simple Java consumer
Simple Java producer
Single Message Transforms
Sizing estimation based on network and disk message throughput
SMM property configuration in Cloudera Manager for Prometheus
Source control of a project
Source, operator and sink in DataStream API
SQL and Table API
SQL and Table API supported features
SQL catalogs for Flink
SQL connectors for Flink
SQL Examples
SQL jobs
SQL Queries in Flink
SQL Statements in Flink
SRM Command Line Tools
SRM security example
SRM Service data traffic reference
srm-control
srm-control Options Reference
Start Prometheus
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start the data flow
Start your data flow
Start your data flow
Start your data flow
Stateless NiFi Source and Sink
Streaming Analytics
Streaming Analytics cluster layout
Streaming Analytics Data Hub cluster definitons
Streaming Analytics in Cloudera
Streaming Analytics overview
Streams Messaging
Streams Messaging cluster layout
Streams Messaging Manager
Streams Messaging Manager integration
Streams Messaging Manager overview
Streams Replication Manager
Streams Replication Manager Architecture
Streams Replication Manager Driver
Streams Replication Manager overview
Streams Replication Manager reference
Streams Replication Manager requirements
Streams Replication Manager Service
Subscribing to a topic
Supported basic data types
Supported data types
Supported NiFi controller services
Supported NiFi extensions
Supported NiFi flow analysis rules [Technical Preview]
Supported NiFi processors
Supported NiFi Python components [Technical Preview]
Supported NiFi reporting tasks
Switching flow persistance providers
Switching flow persistence providers using NiFi Toolkit CLI
Syslog TCP Source
Syslog UDP Source
System Level Broker Tuning
Tables
Task architecture and load-balancing
Terms and concepts
Testing and validating Flink applications
The downscale operation fails with decommission failed
The Kafka Connect UI
TLS/SSL client authentication
Topics
Topics and Groups Subcommand
Troubleshooting
Troubleshooting Prometheus for SMM
Tuning Apache Kafka performance
Tutorial: developing and deploying a JDBC Source dataflow
Tutorials
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understand the use case
Understanding co-located and external clusters
Understanding SRM properties, their configuration and hierarchy
Understanding the kafka-run-class Bash Script
Understanding the use case
Unlocking access to Kafka metadata in Zookeeper
Unsupported command line tools
Unsupported Edge Management features [Technical Preview]
Unsupported Features in Cloudera DataFlow for Data Hub 7.2.18
Unsupported Flow Management features
Unsupported Streaming Analytics features
Unsupported Streams Messaging features
Updating a notifier
Updating an alert policy
Updating Flink job dependencies
Updating Parameter Context when the external source has changed
Updating Parameter sensitivity
Updating SQL queries with PROCTIME function
Use case architectures
Use case overview
Use cases
Use cases for Streams Replication Manager in CDP Public Cloud
Use Kerberos authentication
Use rsync to copy files from one broker to another
Use Schema Registry
User Authorization
Using Apache Flink
Using Apache NiFi
Using Apache NiFi Registry
Using Apache NiFi Toolkit
Using auto discovery of services
Using connectors with templates
Using DataFlow Catalog Registry Client
Using Dynamic Materialized View Endpoints
Using Flink CLI
Using Kafka Connect
Using Parameter Context inheritance
Using Parameter Context inheritance to combine Parameters
Using Parameter Providers
Using Schema Registry
Using SQL Stream Builder
Using SQL Stream Builder with Cloudera Data Visualization
Using SRM in CDP Public Cloud overview
Using Streams Messaging Manager
Using Streams Replication Manager
Using System Functions
Using the AvroConverter
Using the Rebalance Wizard in Cruise Control
Using the service discovery on Streaming SQL Console
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify data flow operation
Verify that you can write data to Kudu
Verify your data flow
Verify your data flow
Verifying metadata collection
Verifying the setup
Versioning a flow in the Catalog
Viewing data lineage in Apache Atlas
Viewing Kafka cluster replication details
Virtual memory handling
What are Parameter Providers?
What is Apache Flink?
What is NiFi Registry?
What is NiFi?
What is Parameter Context inheritance?
What is SQL Stream Builder?
What to do next
What's New in Cloudera DataFlow for Data Hub 7.2.18
What's new in Edge Management [Technical Preview]
What's new in Flow Management with NiFi 1
What's new in Flow Management with NiFi 2 [Technical Preview]
What's new in Streaming Analytics
What's new in Streams Messaging
Widgets
Working with your Flow Management cluster
Writing data in a Kerberos and TLS/SSL enabled cluster
Writing data in an unsecured cluster
Writing Kafka data to Ozone with Kafka Connect
zookeeper-security-migration

«

Filter topics

Ingesting data into Hive

▼Ingesting data into Hive
Next steps

»

Ingesting Data into Hive in CDP Public Cloud

Ingesting data into Hive

Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your CDP user to utilize the central authentication features CDP provides and to exchange credentials for AWS or Azure access tokens, you have to map this CDP user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the Management Console in your CDP environment.
Create the Hive target table
Before you can ingest data into Apache Hive in CDP Public Cloud, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring ConsumeKafkaRecord_2_0. If you are moving data from a location other than Kafka, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using PutHive3Streaming. If you are moving data into another location, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.

© 2019, 2024 by Cloudera, Inc. All rights reserved.

We want your opinion

How can we improve this page?

What kind of feedback do you have?

Can we contact you for follow-up on this?

This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or

7.3.1
7.2
7.1.0
7.0.2