Homepage
/
Cdp One
Search Documentation
«
Filter topics
CDP One
▶︎
Release Notes
CDP One Known Issues and Limitations
▶︎
CDP One Prerequisites
Configuring a site-to-site IPSec VPN to CDP One
Configuring gateway nodes
▶︎
Getting Started
▶︎
CDP One Overview
CDP One console and services
CDP One user roles
▶︎
CDP One Security Overview
Access Control and Governance
▶︎
Ingesting Data
▶︎
Managing Alert Policies
Introduction to alert policies in Streams Messaging Manager
Component types and metrics for alert policies
Notifiers
▶︎
Managing alert policies and notifiers in SMM
Creating a notifier
Updating a notifier
Deleting a notifier
Creating an alert policy
Updating an alert policy
Enabling an alert policy
Disabling an alert policy
Deleting an alert policy
▶︎
Managing Topics
Creating a Kafka topic
Modifying a Kafka topic
Deleting a Kafka topic
▶︎
Monitoring End to End Latency
End to end latency overview
Granularity of metrics for end-to-end latency
Enabling interceptors
Monitoring end to end latency for Kafka topic
End to end latency use case
▶︎
Using Schema Registry
Adding a new schema
Querying a schema
Evolving a schema
Deleting a schema
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
▶︎
Exporting and importing schemas
Exporting schemas using Schema Registry API
Importing schemas using Schema Registry API
▶︎
ID ranges in Schema Registry
Setting a Schema Registry ID range
▶︎
Using Apache NiFi
Introduction
▶︎
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
Terminology
NiFi User Interface
Accessing the UI with Multi-Tenant Authorization
Logging In
▶︎
Building a DataFlow
Adding Components to the Canvas
▶︎
Component Versions
Sorting and Filtering Components
Changing Component Versions
Understanding Version Dependencies
▶︎
Configuring a Processor
Settings Tab
Scheduling Tab
Properties Tab
Comments Tab
Additional Help
▶︎
Configuring a Process Group
General Tab
Controller Services
▶︎
Parameters
Parameter Contexts
Adding a Parameter to a Parameter Context
Parameters and Expression Language
Assigning a Parameter Context to a Process Group
Referencing Parameters
Accessing Parameters
▶︎
Using Custom Properties with Expression Language
Variables
Referencing Custom Properties via nifi.properties
▶︎
Controller Services
Adding Controller Services for Reporting Tasks
Adding Controller Services for Dataflows
Enabling/Disabling Controller Services
Reporting Tasks
▶︎
Connecting Components
Details Tab
Settings
Changing Configuration and Context Menu Options
Bending Connections
Processor Validation
▶︎
Site-to-Site
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Example Dataflow
▶︎
Command and Control of the DataFlow
Starting a Component
Stopping a Component
Terminating a Component's Tasks
Enabling/Disabling a Component
▶︎
Remote Process Group Transmission
Individual Port Transmission
▶︎
Navigating within a DataFlow
Component Linking
▶︎
Component Alignment
Align Vertically
Align Horizontally
▶︎
Search Components in DataFlow
Filters
Keywords
▶︎
Monitoring of DataFlow
Anatomy of a Processor
Anatomy of a Process Group
Anatomy of a Remote Process Group
Queue Interaction
Summary Page
Historical Statistics of a Component
▶︎
Versioning a DataFlow
Connecting to a NiFi Registry
Version States
Import a Versioned Flow
Start Version Control
▶︎
Managing Local Changes
Show Local Changes
Revert Local Changes
Commit Local Changes
Change Version
Stop Version Control
Nested Versioned Flows
Parameters in Versioned Flows
Variables in Versioned Flows
▶︎
Restricted Components in Versioned Flows
Restricted Controller Service Created in Root Process Group
Restricted Controller Service Created in Process Group
▶︎
Templates
Creating a Template
Importing a Template
Instantiating a Template
▶︎
Managing Templates
Exporting a Template
Removing a Template
▶︎
Data Provenance
Provenance Events
Searching for Events
Details of an Event
Replaying a FlowFile
▶︎
Viewing FlowFile Lineage
Find Parents
Expanding an Event
▶︎
Write Ahead Provenance Repository
Backwards Compatibility
Older Existing NiFi Version
Bootstrap.conf
System Properties
▶︎
Repository Encryption
▶︎
Repository Encryption Protocol Version 1
Encryption Metadata Serialization
▶︎
Repository Encryption Configuration
Protocol Version Configuration
Key Provider Configuration
Secret Key Generation and Storage using Keytool
Experimental Warning
Other Management Features
▶︎
Using Apache NiFi Registry
Introduction
▶︎
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
Terminology
NiFi Registry User Interface
Logging In
▶︎
Manage Flows
▶︎
View a Flow
Sorting & Filtering Flows
Import a Flow
Import New Version of a Flow
Export a Flow Version
Delete a Flow
▶︎
Manage Buckets
Sorting & Filtering Buckets
Create a Bucket
Delete a Bucket
Delete Multiple Buckets
Edit a Bucket Name
Make a Bucket Publicly Visible
Allow Bundles in a Bucket to be Overwritten
▶︎
Bucket Policies
Create a Bucket Policy
Delete a Bucket Policy
▶︎
Manage Users & Groups
Sorting & Filtering Users/Groups
Add a User
Delete a User
Delete Multiple Users
Edit a User Name
▶︎
Special Privileges
Grant Special Privileges to a User
▶︎
Manage Groups
Add an Empty Group
Add User to a Group
Create a New Group with Selected Users
▶︎
Remove a User from a Group
User Window
Group Window
Other Group Level Actions
▶︎
Manage Bundles
Upload Bundle
▶︎
Download Bundle
Bundle Coordinates
Bundle Id
Additional Actions
▶︎
Developing Apache Kafka Applications
Kafka producers
▶︎
Kafka consumers
Subscribing to a topic
Groups and fetching
Protocol between consumer and broker
Rebalancing partitions
Retries
Kafka clients and ZooKeeper
▶︎
Java client
▶︎
Client examples
Simple Java consumer
Simple Java producer
Security examples
▶︎
.NET client
▶︎
Client examples
Simple .NET consumer
Simple .NET producer
Performant .NET producer
Security examples
Kafka Streams
Kafka public APIs
Recommendations for client development
▶︎
Monitoring Kafka Clusters
Monitoring Kafka clusters
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring Kafka brokers
Monitoring Kafka consumers
Monitoring lineage information
▶︎
Monitoring Kafka Cluster Replications
Introduction to monitoring Kafka cluster replications in SMM
Configuring SMM for monitoring Kafka cluster replications
▶︎
Viewing Kafka cluster replication details
Searching Kafka cluster replications by source
Monitoring Kafka cluster replications by quick ranges
Monitoring status of the clusters to be replicated
▶︎
Monitoring topics to be replicated
Searching by topic name
Monitoring throughput for cluster replication
Monitoring replication latency for cluster replication
Monitoring checkpoint latency for cluster replication
Monitoring replication throughput and latency by values
▶︎
Query & Notebooks
▶︎
Managing ML Projects
Collaboration Models
Sharing Job and Session Console Outputs
▶︎
Managing Projects
Creating a Project with Legacy Engine Variants
Creating a Project with ML Runtimes Variants
Configuring Project-level Runtimes
Adding Project Collaborators
Modifying Project Settings
Managing Project Files
Custom Template Projects
Deleting a Project
▶︎
Native Workbench Console and Editor
Launch a Session
Run Code
Access the Terminal
Stop a Session
▶︎
Third-Party Editors
Modes of Configuration
▶︎
Configure a Browser IDE as an Editor
Testing a Browser IDE in a Session
Project Level Configuration
Legacy Engine Level Configuration
Configure a Local IDE using an SSH Gateway
▶︎
Configure PyCharm as a Local IDE
Add Cloudera Machine Learning as an Interpreter for PyCharm
Configure PyCharm to use Cloudera Machine Learning as the Remote Console
(Optional) Configure the Sync Between Cloudera Machine Learning and PyCharm
▶︎
Configure VS Code as a Local IDE
Download cdswctl and Add an SSH Key
Initialize an SSH Connection to Cloudera Machine Learning for VS Code
Setting up VS Code
(Optional) Using VS Code with Python
(Optional) Using VS Code with R
(Optional) Using VS Code with Jupyter
(Optional) Using VS Code with Git integration
Limiting files in Explorer view
▶︎
Git for Collaboration
Linking an Existing Project to a Git Remote
▶︎
Embedded Web Applications
Example: A Shiny Application
▶︎
Deploying Applied ML Prototypes (AMPs)
Applied ML Prototypes (AMPs)
Creating New AMPs
Custom AMP Catalog
Add a catalog
Catalog File Specification
AMP Project Specification
▶︎
Analyzing Data with Hue
About using Hue
Accessing and using Hue
▶︎
Using governance-based data discovery
Defining metadata tags
Searching metadata tags
Supported non-ASCII and special characters in Hue
▶︎
Monitor & Operate
ML Applications
▶︎
Monitoring workloads
▶︎
Monitoring YARN Applications
Viewing Jobs
Configuring YARN Application Monitoring
Results Tab
Filtering Jobs
Filter Expressions
Choosing and Running a Filter
Filter Attributes
Sending Diagnostic Data to Cloudera for YARN Applications
▶︎
Monitoring Impala Queries
Viewing Queries
Configuring Impala Query Monitoring
Impala Best Practices
Results Tab
Filtering Queries
Filter Expressions
Filter Attributes
Choosing and Running a Filter
▶︎
Analyzing YARN jobs
Viewing the Cluster Overview
Viewing nodes and node details
Viewing queues and queue details
▶︎
Viewing all applications
Searching applications
Viewing application details
UI Tools
Using the YARN CLI to viewlogs for applications
▶︎
Analyzing Spark jobs
Accessing the Spark History Server web UI
▶︎
Viewing YARN queues
Adding queues using YARN Queue Manager UI
Configuring cluster capacity with queues
Configuring the resource capacity of root queue
Changing resource allocation mode
Starting and stopping queues
Deleting queues
Setting queue priorities
Viewing the YARN job history
▶︎
Security
▶︎
Managing Ranger Access Policies
Using Ranger to Provide Authorization in CDP
Ranger special entities
▶︎
Ranger Policies Overview
Ranger tag-based policies
Tags and policy evaluation
Ranger access conditions
▶︎
Using the Ranger Console
Accessing the Ranger console
Ranger console navigation
▶︎
Resource-based Services and Policies
▶︎
Configuring resource-based services
Configure a resource-based service: ADLS
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: S3
Configure a resource-based service: Solr
Configure a resource-based service: YARN
▶︎
Configuring resource-based policies
Configure a resource-based policy: ADLS
Configure a resource-based policy: Atlas
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: HadoopSQL
Configure a resource-based storage handler policy: HadoopSQL
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: S3
Configure a resource-based policy: Solr
Configure a resource-based policy: YARN
Wildcards and variables in resource-based policies
Preloaded resource-based services and policies
▶︎
Importing and exporting resource-based policies
Import resource-based policies for a specific service
Import resource-based policies for all services
Export resource-based policies for a specific service
Export all resource-based policies for all services
▶︎
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
▶︎
Tag-based Services and Policies
Adding a tag-based service
▶︎
Adding tag-based policies
Using tag attributes and values in Ranger tag-based policy conditions
Adding a tag-based PII policy
Default EXPIRES ON tag policy
▶︎
Importing and exporting tag-based policies
Import tag-based policies
Export tag-based policies
Create a time-bound policy
▶︎
Ranger Security Zones
Overview
Adding a Ranger security zone
▶︎
Administering Ranger Users, Groups, Roles, and Permissions
Add a user
Edit a user
Delete a user
Add a group
Edit a group
Delete a group
Add a role through Ranger
Add a role through Hive
Edit a role
Delete a role
Add or edit permissions
▶︎
Administering Ranger Reports
View Ranger reports
Search Ranger reports
Export Ranger reports
Using Ranger client libraries
Using session cookies to validate Ranger policies
▶︎
Managing Ranger Auditing
Audit Overview
▶︎
Managing Auditing with Ranger
View audit details
Create a read-only Admin user (Auditor)
Ranger Audit Filters
▶︎
Managing Group Membership
Managing groups in CDP One
▶︎
Accessing Clusters
About accessing clusters
Setting a workload password
Using SSH to access gateway nodes
Registering SSH keys
Creating an SSH key pair
▶︎
Running Apache Spark Applications
Introduction
Running your first Spark application
Running sample Spark applications
▶︎
Configuring Spark Applications
Configuring Spark application properties in spark-defaults.conf
Configuring Spark application logging properties
▶︎
Submitting Spark applications
spark-submit command options
Spark cluster execution overview
Canary test for pyspark command
Fetching Spark Maven dependencies
Accessing the Spark History Server
▶︎
Running Spark applications on YARN
Spark on YARN deployment modes
Submitting Spark Applications to YARN
Monitoring and Debugging Spark Applications
Example: Running SparkPi on YARN
Configuring Spark on YARN Applications
Dynamic allocation
▶︎
Submitting Spark applications using Livy
Using Livy with Spark
Using Livy with interactive notebooks
Using the Livy API to run Spark jobs
▶︎
Running an interactive session with the Livy API
Livy objects for interactive sessions
Setting Python path variables for Livy
Livy API reference for interactive sessions
▶︎
Submitting batch applications using the Livy API
Livy batch object
Livy API reference for batch jobs
▶︎
Using PySpark
Running PySpark in a virtual environment
Running Spark Python applications
Automating Spark Jobs with Oozie Spark Action
▶︎
Running SQL Queries
About running SQL queries
Using SSH to access gateway nodes
Setting up Ranger to run SQL queries
Running Hive queries
Running Impala queries
Setting a workload password
Registering SSH keys
Creating an SSH key pair
▶︎
Using SQL
Overview
SQL tables in CDP
SQL table locations
Creating a CRUD transactional table in Hive
Creating an insert-only transactional table
Creating an external table
Dropping an external table and data
▶︎
Partitioned tables
Creating and loading a managed, partitioned table
Creating an Impala external partitioned table
Creating a Hive external partitioned table
Repairing Hive or Impala partitions
Using your schema in PostgreSQL
Determining the table type
Using database.table in queries
Inserting data into a table
Updating data in a table
Deleting data from a table
Using a subquery
Aggregating and grouping data
▼
Migrating Data to CDP One
Overview
▼
Migrating data from CDH to CDP One
▶︎
Migrating HDFS and Hive data from CDH to CDP One
▶︎
Migration prerequisites
Ports for Replication Manager on CDP Public Cloud
Setting up an external account
Setting up SSL/TLS certificate exchange
Cloudera license requirements for Replication Manager
▶︎
Introduction to Replication Manager
▶︎
Accessing the Replication Manager service
How replication policies work
Replication policy considerations
▶︎
Working with cloud credentials
Adding cloud credentials
Update cloud credentials
Delete cloud credentials
▶︎
HDFS data migration from CDH to CDP One
Creating a HDFS replication policy
Verifying HDFS data migration
▶︎
Hive migration from CDH to CDP One
Creating a Hive replication policy
Verifying Hive data migration
▶︎
Migrating Oozie workflows from CDH to CDP One
About Migrating Oozie workloads
Migration prerequisites
Setting up an external account
▶︎
Migrating Hue databases from CDH to CDP One
Performing post-migration tasks
▶︎
Migrating HDFS native permissions to CDP One
Extracting HDFS native permissions
Converting HDFS native permissions into Ranger HDFS policies
Transforming Ranger HDFS policies into Ranger S3 policies
Importing Ranger AWS S3 policies
Migrating workflows directly created in Oozie to CDP One
▶︎
Migrating Sentry policies from CDH to CDP One
▶︎
About Migrating Sentry policies
▶︎
Migration prerequisites
Setting up an external account
Exporting Sentry permissions
Importing Sentry permissions into Ranger
▶︎
Migrating data from HDP to CDP One
▶︎
Migrating HDFS data from HDP to CDP One
Migration prerequisites
▶︎
About DistCp tool
Using the DistCp tool
Unbanning hdfs user in HDP cluster
Before migrating
HDFS data migration from HDP to CDP One
▶︎
Migrating HDFS native permissions to CDP One
Extracting HDFS native permissions
Converting HDFS native permissions into Ranger HDFS policies
Transforming Ranger HDFS policies into Ranger S3 policies
Importing Ranger AWS S3 policies
▶︎
Migrating Ranger policies from HDP to CDP One
▶︎
About Migrating Ranger policies
Migration prerequisites
Copying Policy Migration utility to the source cluster
▶︎
Performing Export and Transform operations
▶︎
About the export operation
Running the export operation
▶︎
About the transform operation
Running the transform operation
Performing Import operation
Supported Input parameters for Export operation
Supported Input parameters for Transform operation
▶︎
Migrating Hive data from HDP 2.x or HDP 3.x to CDP One
▶︎
Migration prerequisites
Setting up Hive JDBC standalone JARS
Saving Hive metastore on HDP by dumping
Taking a mandatory snapshot of HDP tables
Setting up security
Installing and configuring HMS Mirror
Sample YAML configuration file
Testing the YAML and the cluster connection
HMS Mirror command summary
Migrating Hive metadata
HMS Mirror generated files
Verifying metadata migration
Migrating actual Hive data
Adjust AVRO table schema URLs
Verifying actual Hive data migration
Table locations
Fixing statistics
Changes to HDP Hive tables
▶︎
Migrating Workloads to CDP One
Overview
▶︎
Migrating Spark workloads to CDP
▶︎
Spark 1.6 to Spark 2.4 Refactoring
Handling prerequisites
▶︎
Spark 1.6 to Spark 2.4 changes
New Spark entry point SparkSession
Dataframe API registerTempTable deprecated
union replaces unionAll
Empty schema not supported
Referencing a corrupt JSON/CSV record
Dataset and DataFrame API explode deprecated
CSV header and schema match
Table properties support
CREATE OR REPLACE VIEW and ALTER VIEW not supported
Managed table location
Write to Hive bucketed tables
Rounding in arithmetic operations
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
▶︎
Compiling and running Spark workloads
Compiling and running a Java-based job
Compiling and running a Scala-based job
Running a Python-based job
Running a job interactively
Post-migration tasks
▶︎
Spark 2.3 to Spark 2.4 Refactoring
Handling prerequisites
▶︎
Spark 2.3 to Spark 2.4 changes
Empty schema not supported
CSV header and schema match
Table properties support
Managed table location
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Post-migration tasks
▶︎
Migrating Hive and Impala workloads to CDP One
Handling prerequisites
▶︎
Hive 1 and 2 to Hive 3 changes
Reserved keywords
Spark-client JAR requires prefix
Hive warehouse directory
Replace Hive CLI with Beeline
PARTIALSCAN
Concatenation of an external table
INSERT OVERWRITE
Managed to external table
Property changes affecting ordered or sorted subqueries and views
Runtime configuration changes
Prepare Hive tables for migration
Impala changes from CDH to CDP
Impala configuration differences in CDH and CDP
Additional documentation
(Optional) Configure the Sync Between Cloudera Machine Learning and PyCharm
(Optional) Using VS Code with Git integration
(Optional) Using VS Code with Jupyter
(Optional) Using VS Code with Python
(Optional) Using VS Code with R
.NET client
About accessing clusters
About DistCp tool
About Migrating Oozie workloads
About Migrating Ranger policies
About Migrating Sentry policies
About running SQL queries
About the export operation
About the transform operation
About using Hue
Access Control and Governance
Access the Terminal
Accessing and using Hue
Accessing Clusters
Accessing Parameters
Accessing the Ranger console
Accessing the Replication Manager service
Accessing the Spark History Server
Accessing the Spark History Server web UI
Accessing the UI with Multi-Tenant Authorization
Add a catalog
Add a group
Add a role through Hive
Add a role through Ranger
Add a User
Add a user
Add an Empty Group
Add Cloudera Machine Learning as an Interpreter for PyCharm
Add or edit permissions
Add User to a Group
Adding a new schema
Adding a Parameter to a Parameter Context
Adding a Ranger security zone
Adding a tag-based PII policy
Adding a tag-based service
Adding cloud credentials
Adding Components to the Canvas
Adding Controller Services for Dataflows
Adding Controller Services for Reporting Tasks
Adding Project Collaborators
Adding queues using YARN Queue Manager UI
Adding tag-based policies
Additional Actions
Additional documentation
Additional Help
Adjust AVRO table schema URLs
Administering Ranger Reports
Administering Ranger Users, Groups, Roles, and Permissions
Aggregating and grouping data
Align Horizontally
Align Vertically
Allow Bundles in a Bucket to be Overwritten
AMP Project Specification
Analyzing Data with Hue
Analyzing Spark jobs
Analyzing YARN jobs
Anatomy of a Process Group
Anatomy of a Processor
Anatomy of a Remote Process Group
Applied ML Prototypes (AMPs)
Assigning a Parameter Context to a Process Group
Audit Overview
Automating Spark Jobs with Oozie Spark Action
Backwards Compatibility
Before migrating
Bending Connections
Bootstrap.conf
Browser Support
Browser Support
Bucket Policies
Building a DataFlow
Bundle Coordinates
Bundle Id
Canary test for pyspark command
Catalog File Specification
CDP One
CDP One console and services
CDP One Known Issues and Limitations
CDP One Overview
CDP One Prerequisites
CDP One Security Overview
CDP One user roles
Change Version
Changes to HDP Hive tables
Changing Component Versions
Changing Configuration and Context Menu Options
Changing resource allocation mode
Choosing and Running a Filter
Choosing and Running a Filter
Client examples
Client examples
Cloudera license requirements for Replication Manager
Collaboration Models
Command and Control of the DataFlow
Comments Tab
Commit Local Changes
Compiling and running a Java-based job
Compiling and running a Scala-based job
Compiling and running Spark workloads
Compiling and running Spark workloads
Component Alignment
Component Linking
Component types and metrics for alert policies
Component Versions
Concatenation of an external table
Configure a Browser IDE as an Editor
Configure a Local IDE using an SSH Gateway
Configure a resource-based policy: ADLS
Configure a resource-based policy: Atlas
Configure a resource-based policy: HadoopSQL
Configure a resource-based policy: HBase
Configure a resource-based policy: HDFS
Configure a resource-based policy: Kafka
Configure a resource-based policy: Knox
Configure a resource-based policy: NiFi
Configure a resource-based policy: NiFi Registry
Configure a resource-based policy: S3
Configure a resource-based policy: Solr
Configure a resource-based policy: YARN
Configure a resource-based service: ADLS
Configure a resource-based service: Atlas
Configure a resource-based service: HBase
Configure a resource-based service: HDFS
Configure a resource-based service: Hive
Configure a resource-based service: Kafka
Configure a resource-based service: Knox
Configure a resource-based service: NiFi
Configure a resource-based service: NiFi Registry
Configure a resource-based service: S3
Configure a resource-based service: Solr
Configure a resource-based service: YARN
Configure a resource-based storage handler policy: HadoopSQL
Configure PyCharm as a Local IDE
Configure PyCharm to use Cloudera Machine Learning as the Remote Console
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Configure VS Code as a Local IDE
Configuring a Process Group
Configuring a Processor
Configuring a site-to-site IPSec VPN to CDP One
Configuring cluster capacity with queues
Configuring gateway nodes
Configuring Impala Query Monitoring
Configuring Project-level Runtimes
Configuring resource-based policies
Configuring resource-based services
Configuring SMM for monitoring Kafka cluster replications
Configuring Spark application logging properties
Configuring Spark application properties in spark-defaults.conf
Configuring Spark Applications
Configuring Spark on YARN Applications
Configuring storage locations
Configuring storage locations
Configuring the resource capacity of root queue
Configuring YARN Application Monitoring
Connecting Components
Connecting to a NiFi Registry
Controller Services
Controller Services
Converting HDFS native permissions into Ranger HDFS policies
Converting HDFS native permissions into Ranger HDFS policies
Copying Policy Migration utility to the source cluster
Create a Bucket
Create a Bucket Policy
Create a New Group with Selected Users
Create a read-only Admin user (Auditor)
Create a time-bound policy
CREATE OR REPLACE VIEW and ALTER VIEW not supported
Creating a CRUD transactional table in Hive
Creating a HDFS replication policy
Creating a Hive external partitioned table
Creating a Hive replication policy
Creating a Kafka topic
Creating a notifier
Creating a Project with Legacy Engine Variants
Creating a Project with ML Runtimes Variants
Creating a Template
Creating an alert policy
Creating an external table
Creating an Impala external partitioned table
Creating an insert-only transactional table
Creating an SSH key pair
Creating an SSH key pair
Creating and loading a managed, partitioned table
Creating New AMPs
CSV bad record handling
CSV bad record handling
CSV header and schema match
CSV header and schema match
Custom AMP Catalog
Custom Template Projects
Data Provenance
Dataframe API registerTempTable deprecated
Dataset and DataFrame API explode deprecated
Default EXPIRES ON tag policy
Defining metadata tags
Delete a Bucket
Delete a Bucket Policy
Delete a Flow
Delete a group
Delete a role
Delete a User
Delete a user
Delete cloud credentials
Delete Multiple Buckets
Delete Multiple Users
Deleting a Kafka topic
Deleting a notifier
Deleting a Project
Deleting a schema
Deleting an alert policy
Deleting data from a table
Deleting queues
Deploying Applied ML Prototypes (AMPs)
Details of an Event
Details Tab
Determining the table type
Developing Apache Kafka Applications
Disabling an alert policy
Download Bundle
Download cdswctl and Add an SSH Key
Dropping an external table and data
Dynamic allocation
Dynamic resource-based column masking in Hive with Ranger policies
Dynamic tag-based column masking in Hive with Ranger policies
Edit a Bucket Name
Edit a group
Edit a role
Edit a user
Edit a User Name
Embedded Web Applications
Empty schema not supported
Empty schema not supported
Enabling an alert policy
Enabling interceptors
Enabling/Disabling a Component
Enabling/Disabling Controller Services
Encryption Metadata Serialization
End to end latency overview
End to end latency use case
Evolving a schema
Example Dataflow
Example: A Shiny Application
Example: Running SparkPi on YARN
Expanding an Event
Experimental Warning
Export a Flow Version
Export all resource-based policies for all services
Export Ranger reports
Export resource-based policies for a specific service
Export tag-based policies
Exporting a Template
Exporting and importing schemas
Exporting schemas using Schema Registry API
Exporting Sentry permissions
Extracting HDFS native permissions
Extracting HDFS native permissions
Fetching Spark Maven dependencies
Filter Attributes
Filter Attributes
Filter Expressions
Filter Expressions
Filtering Jobs
Filtering Queries
Filters
Find Parents
Fixing statistics
General Tab
Getting Started
Git for Collaboration
Grant Special Privileges to a User
Granularity of metrics for end-to-end latency
Group Window
Groups and fetching
Handling prerequisites
Handling prerequisites
Handling prerequisites
HAVING without GROUP BY
HAVING without GROUP BY
HDFS data migration from CDH to CDP One
HDFS data migration from HDP to CDP One
Historical Statistics of a Component
Hive 1 and 2 to Hive 3 changes
Hive migration from CDH to CDP One
Hive warehouse directory
HMS Mirror command summary
HMS Mirror generated files
How replication policies work
ID ranges in Schema Registry
Impala Best Practices
Impala changes from CDH to CDP
Impala configuration differences in CDH and CDP
Import a Flow
Import a Versioned Flow
Import New Version of a Flow
Import resource-based policies for a specific service
Import resource-based policies for all services
Import tag-based policies
Importing a Template
Importing and exporting resource-based policies
Importing and exporting tag-based policies
Importing Confluent Schema Registry schemas into Cloudera Schema Registry
Importing Ranger AWS S3 policies
Importing Ranger AWS S3 policies
Importing schemas using Schema Registry API
Importing Sentry permissions into Ranger
Individual Port Transmission
Ingesting Data
Initialize an SSH Connection to Cloudera Machine Learning for VS Code
INSERT OVERWRITE
Inserting data into a table
Installing and configuring HMS Mirror
Instantiating a Template
Introduction
Introduction
Introduction
Introduction to alert policies in Streams Messaging Manager
Introduction to monitoring Kafka cluster replications in SMM
Introduction to Replication Manager
Java client
Kafka clients and ZooKeeper
Kafka consumers
Kafka producers
Kafka public APIs
Kafka Streams
Key Provider Configuration
Keywords
Launch a Session
Legacy Engine Level Configuration
Limiting files in Explorer view
Linking an Existing Project to a Git Remote
Livy API reference for batch jobs
Livy API reference for interactive sessions
Livy batch object
Livy objects for interactive sessions
Logging In
Logging In
Make a Bucket Publicly Visible
Manage Buckets
Manage Bundles
Manage Flows
Manage Groups
Manage Users & Groups
Managed table location
Managed table location
Managed to external table
Managing Alert Policies
Managing alert policies and notifiers in SMM
Managing Auditing with Ranger
Managing Group Membership
Managing groups in CDP One
Managing Local Changes
Managing ML Projects
Managing Project Files
Managing Projects
Managing Ranger Access Policies
Managing Ranger Auditing
Managing Templates
Managing Topics
Migrating actual Hive data
Migrating data from CDH to CDP One
Migrating data from HDP to CDP One
Migrating Data to CDP One
Migrating HDFS and Hive data from CDH to CDP One
Migrating HDFS data from HDP to CDP One
Migrating HDFS native permissions to CDP One
Migrating HDFS native permissions to CDP One
Migrating Hive and Impala workloads to CDP One
Migrating Hive data from HDP 2.x or HDP 3.x to CDP One
Migrating Hive metadata
Migrating Hue databases from CDH to CDP One
Migrating Oozie workflows from CDH to CDP One
Migrating Ranger policies from HDP to CDP One
Migrating Sentry policies from CDH to CDP One
Migrating Spark workloads to CDP
Migrating workflows directly created in Oozie to CDP One
Migrating Workloads to CDP One
Migration prerequisites
Migration prerequisites
Migration prerequisites
Migration prerequisites
Migration prerequisites
Migration prerequisites
ML Applications
Modes of Configuration
Modifying a Kafka topic
Modifying Project Settings
Monitor & Operate
Monitoring and Debugging Spark Applications
Monitoring checkpoint latency for cluster replication
Monitoring End to End Latency
Monitoring end to end latency for Kafka topic
Monitoring Impala Queries
Monitoring Kafka brokers
Monitoring Kafka Cluster Replications
Monitoring Kafka cluster replications by quick ranges
Monitoring Kafka Clusters
Monitoring Kafka clusters
Monitoring Kafka consumers
Monitoring Kafka producers
Monitoring Kafka topics
Monitoring lineage information
Monitoring of DataFlow
Monitoring replication latency for cluster replication
Monitoring replication throughput and latency by values
Monitoring status of the clusters to be replicated
Monitoring throughput for cluster replication
Monitoring topics to be replicated
Monitoring workloads
Monitoring YARN Applications
Native Workbench Console and Editor
Navigating within a DataFlow
Nested Versioned Flows
New Spark entry point SparkSession
NiFi Registry User Interface
NiFi User Interface
Notifiers
Older Existing NiFi Version
Other Group Level Actions
Other Management Features
Overview
Overview
Overview
Overview
Parameter Contexts
Parameters
Parameters and Expression Language
Parameters in Versioned Flows
PARTIALSCAN
Partitioned tables
Performant .NET producer
Performing Export and Transform operations
Performing Import operation
Performing post-migration tasks
Ports for Replication Manager on CDP Public Cloud
Post-migration tasks
Post-migration tasks
Precedence of set operations
Precedence of set operations
Preloaded resource-based services and policies
Prepare Hive tables for migration
Processor Validation
Project Level Configuration
Properties Tab
Property changes affecting ordered or sorted subqueries and views
Protocol between consumer and broker
Protocol Version Configuration
Provenance Events
Query & Notebooks
Querying a schema
Querying Hive managed tables from Spark
Querying Hive managed tables from Spark
Queue Interaction
Ranger access conditions
Ranger Audit Filters
Ranger console navigation
Ranger Policies Overview
Ranger Security Zones
Ranger special entities
Ranger tag-based policies
Rebalancing partitions
Recommendations for client development
Referencing a corrupt JSON/CSV record
Referencing Custom Properties via nifi.properties
Referencing Parameters
Registering SSH keys
Registering SSH keys
Release Notes
Remote Process Group Transmission
Remove a User from a Group
Removing a Template
Repairing Hive or Impala partitions
Replace Hive CLI with Beeline
Replaying a FlowFile
Replication policy considerations
Reporting Tasks
Repository Encryption
Repository Encryption Configuration
Repository Encryption Protocol Version 1
Reserved keywords
Resource-based Services and Policies
Restricted Components in Versioned Flows
Restricted Controller Service Created in Process Group
Restricted Controller Service Created in Root Process Group
Results Tab
Results Tab
Retries
Revert Local Changes
Rounding in arithmetic operations
Row-level filtering and column masking in Hive
Row-level filtering in Hive with Ranger policies
Run Code
Running a job interactively
Running a Python-based job
Running an interactive session with the Livy API
Running Apache Spark Applications
Running Hive queries
Running Impala queries
Running PySpark in a virtual environment
Running sample Spark applications
Running Spark applications on YARN
Running Spark Python applications
Running SQL Queries
Running the export operation
Running the transform operation
Running your first Spark application
Runtime configuration changes
Sample YAML configuration file
Saving Hive metastore on HDP by dumping
Scheduling Tab
Search Components in DataFlow
Search Ranger reports
Searching applications
Searching by topic name
Searching for Events
Searching Kafka cluster replications by source
Searching metadata tags
Secret Key Generation and Storage using Keytool
Security
Security examples
Security examples
Sending Diagnostic Data to Cloudera for YARN Applications
Setting a Schema Registry ID range
Setting a workload password
Setting a workload password
Setting Python path variables for Livy
Setting queue priorities
Setting up an external account
Setting up an external account
Setting up an external account
Setting up Hive JDBC standalone JARS
Setting up Ranger to run SQL queries
Setting up security
Setting up SSL/TLS certificate exchange
Setting up VS Code
Settings
Settings Tab
Sharing Job and Session Console Outputs
Show Local Changes
Simple .NET consumer
Simple .NET producer
Simple Java consumer
Simple Java producer
Site-to-Site
Sorting & Filtering Buckets
Sorting & Filtering Flows
Sorting & Filtering Users/Groups
Sorting and Filtering Components
Spark 1.6 to Spark 2.4 changes
Spark 1.6 to Spark 2.4 Refactoring
Spark 2.3 to Spark 2.4 changes
Spark 2.3 to Spark 2.4 Refactoring
Spark 2.4 CSV example
Spark 2.4 CSV example
Spark cluster execution overview
Spark on YARN deployment modes
Spark-client JAR requires prefix
spark-submit command options
Special Privileges
SQL table locations
SQL tables in CDP
Start Version Control
Starting a Component
Starting and stopping queues
Stop a Session
Stop Version Control
Stopping a Component
Submitting batch applications using the Livy API
Submitting Spark applications
Submitting Spark Applications to YARN
Submitting Spark applications using Livy
Subscribing to a topic
Summary Page
Supported Input parameters for Export operation
Supported Input parameters for Transform operation
Supported non-ASCII and special characters in Hue
System Properties
Table locations
Table properties support
Table properties support
Tag-based Services and Policies
Tags and policy evaluation
Taking a mandatory snapshot of HDP tables
Templates
Terminating a Component's Tasks
Terminology
Terminology
Testing a Browser IDE in a Session
Testing the YAML and the cluster connection
Third-Party Editors
Transforming Ranger HDFS policies into Ranger S3 policies
Transforming Ranger HDFS policies into Ranger S3 policies
UI Tools
Unbanning hdfs user in HDP cluster
Understanding Version Dependencies
union replaces unionAll
Unsupported Browsers
Unsupported Browsers
Update cloud credentials
Updating a notifier
Updating an alert policy
Updating data in a table
Upload Bundle
User Window
Using a subquery
Using Apache NiFi
Using Apache NiFi Registry
Using Custom Properties with Expression Language
Using database.table in queries
Using governance-based data discovery
Using Livy with interactive notebooks
Using Livy with Spark
Using PySpark
Using Ranger client libraries
Using Ranger to Provide Authorization in CDP
Using Schema Registry
Using session cookies to validate Ranger policies
Using SQL
Using SSH to access gateway nodes
Using SSH to access gateway nodes
Using tag attributes and values in Ranger tag-based policy conditions
Using the DistCp tool
Using the Livy API to run Spark jobs
Using the Ranger Console
Using the YARN CLI to viewlogs for applications
Using your schema in PostgreSQL
Variables
Variables in Versioned Flows
Verifying actual Hive data migration
Verifying HDFS data migration
Verifying Hive data migration
Verifying metadata migration
Version States
Versioning a DataFlow
View a Flow
View audit details
View Ranger reports
Viewing all applications
Viewing application details
Viewing FlowFile Lineage
Viewing Jobs
Viewing Kafka cluster replication details
Viewing nodes and node details
Viewing Queries
Viewing queues and queue details
Viewing the Cluster Overview
Viewing the UI in Variably Sized Browsers
Viewing the UI in Variably Sized Browsers
Viewing the YARN job history
Viewing YARN queues
Wildcards and variables in resource-based policies
Working with cloud credentials
Write Ahead Provenance Repository
Write to Hive bucketed tables
«
Filter topics
Migrating data from CDH to CDP One
Overview
▼
Migrating data from CDH to CDP One
▶︎
Migrating HDFS and Hive data from CDH to CDP One
▶︎
Migration prerequisites
Ports for Replication Manager on CDP Public Cloud
Setting up an external account
Setting up SSL/TLS certificate exchange
Cloudera license requirements for Replication Manager
▶︎
Introduction to Replication Manager
▶︎
Accessing the Replication Manager service
How replication policies work
Replication policy considerations
▶︎
Working with cloud credentials
Adding cloud credentials
Update cloud credentials
Delete cloud credentials
▶︎
HDFS data migration from CDH to CDP One
Creating a HDFS replication policy
Verifying HDFS data migration
▶︎
Hive migration from CDH to CDP One
Creating a Hive replication policy
Verifying Hive data migration
▶︎
Migrating Oozie workflows from CDH to CDP One
About Migrating Oozie workloads
Migration prerequisites
Setting up an external account
▶︎
Migrating Hue databases from CDH to CDP One
Performing post-migration tasks
▶︎
Migrating HDFS native permissions to CDP One
Extracting HDFS native permissions
Converting HDFS native permissions into Ranger HDFS policies
Transforming Ranger HDFS policies into Ranger S3 policies
Importing Ranger AWS S3 policies
Migrating workflows directly created in Oozie to CDP One
▶︎
Migrating Sentry policies from CDH to CDP One
▶︎
About Migrating Sentry policies
▶︎
Migration prerequisites
Setting up an external account
Exporting Sentry permissions
Importing Sentry permissions into Ranger
▶︎
Migrating data from HDP to CDP One
▶︎
Migrating HDFS data from HDP to CDP One
Migration prerequisites
▶︎
About DistCp tool
Using the DistCp tool
Unbanning hdfs user in HDP cluster
Before migrating
HDFS data migration from HDP to CDP One
▶︎
Migrating HDFS native permissions to CDP One
Extracting HDFS native permissions
Converting HDFS native permissions into Ranger HDFS policies
Transforming Ranger HDFS policies into Ranger S3 policies
Importing Ranger AWS S3 policies
▶︎
Migrating Ranger policies from HDP to CDP One
▶︎
About Migrating Ranger policies
Migration prerequisites
Copying Policy Migration utility to the source cluster
▶︎
Performing Export and Transform operations
▶︎
About the export operation
Running the export operation
▶︎
About the transform operation
Running the transform operation
Performing Import operation
Supported Input parameters for Export operation
Supported Input parameters for Transform operation
▶︎
Migrating Hive data from HDP 2.x or HDP 3.x to CDP One
▶︎
Migration prerequisites
Setting up Hive JDBC standalone JARS
Saving Hive metastore on HDP by dumping
Taking a mandatory snapshot of HDP tables
Setting up security
Installing and configuring HMS Mirror
Sample YAML configuration file
Testing the YAML and the cluster connection
HMS Mirror command summary
Migrating Hive metadata
HMS Mirror generated files
Verifying metadata migration
Migrating actual Hive data
Adjust AVRO table schema URLs
Verifying actual Hive data migration
Table locations
Fixing statistics
Changes to HDP Hive tables
»
Migrating Data to CDP One
Migrating data from CDH to
CDP One
How to migrate data from CDH to
CDP One
.
Migrating HDFS and Hive data from CDH to CDP One
An overview of the migration process from CDH to
CDP One
prepares you to migrate HDFS and Hive data to the AWS S3 endpoint.
Migrating Oozie workflows from CDH to CDP One
Hue stoores the workflows within the Hue database which is created using Hue. The data residing in Hue is migrated to
CDP One
.
Migrating HDFS native permissions to CDP One
If you have HDFS native permissions in your CDH or HDP clusters, you learn how to convert the native permissions into Ranger policy format and import the policies to
CDP One
. You can choose to ignore this migration process if you do not have any HDFS native native permissions.
Migrating workflows directly created in Oozie to CDP One
The oozie workflows present on HDFS must be migrated to
CDP One
.
Migrating Sentry policies from CDH to CDP One
In the CDH environment consisting of Sentry policies, permissions need to be migrated from Sentry to Ranger. This migration process is supported by the
Authzmigrator
tool.
This site uses cookies and related technologies, as described in our
privacy policy
, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or
manage your own preferences.
Accept all