Homepage
/
Cloudera Data Engineering
1.5.3
(Private Cloud)
Search Documentation
▶︎
Cloudera
Reference Architectures
▶︎
Cloudera on cloud
Getting Started
Patterns
Preview Features
Data Catalog
Data Engineering
Data Flow
Data Hub
Data Warehouse
Data Warehouse Runtime
Cloudera AI
Management Console
Operational Database
Replication Manager
DataFlow for Data Hub
Runtime
▼
Cloudera on premises
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
CDW Runtime
Machine Learning
Base
Getting Started
Runtime & Cloudera Manager
Upgrade
Storage
Flow Management
Streaming Analytics
CFM Operator
CSA Operator
CSM Operator
▶︎
Cloudera Manager
Cloudera Manager
▶︎
Applications
Cloudera Streaming Community Edition
Data Science Workbench
Data Visualization
Edge Management
Observability SaaS
Observability on premises
Workload XM On-Prem
▶︎
Legacy
Cloudera Enterprise
Flow Management
Stream Processing
HDP
HDF
Streams Messaging Manager
Streams Replication Manager
▶︎
Data Services
Getting Started
Cloudera Manager
Management Console
Replication Manager
Data Catalog
Data Engineering
Data Warehouse
CDW Runtime
Machine Learning
Base
Getting Started
Runtime & Cloudera Manager
Upgrade
Storage
Flow Management
Streaming Analytics
CFM Operator
CSA Operator
CSM Operator
«
Filter topics
Data Engineering
▶︎
Top Tasks
Creating Sessions in Cloudera Data Engineering
Creating jobs
Automating data pipelines using Apache Airflow
▶︎
Monitoring resources using Grafana dashboards
Connecting to Grafana dashboards
Accessing Grafana dashboards
▶︎
Release Notes
What's new
Known issues
Creating Virtual Cluster without installing Atlas in your CDP Base cluster
Compatibility for Cloudera Data Engineering and Runtime components
▶︎
Data Engineering Overview
Cloudera Data Engineering service
Cloudera Data Engineering resources
▶︎
Data Engineering Prerequisites
CDE Private Cloud prerequisites
▶︎
Using GPUs in Cloudera Data Engineering (Technical Preview)
GPU nodes setup
Testing GPU setup
Managing heterogenous GPU nodes
Spark GPU Runtime Images
Quota Management
▼
How To
▶︎
Enabling and disabling Cloudera Data Engineering
Adding a Cloudera Data Engineering service
Managing a CDE Service
Removing a Cloudera Data Engineering service
▶︎
Creating and managing virtual clusters
Creating virtual clusters
Initializing virtual clusters
Managing virtual clusters
Deleting virtual clusters
▶︎
Configuring Data Connectors
▶︎
Using Ozone storage with CDE
Adding Ozone data connector for Cloudera Data Engineering service
Deleting Ozone data connector for Cloudera Data Engineering service
▶︎
Backing up and Restoring Data Connectors
Backing up the data connector
▶︎
Creating and managing CDE jobs
▶︎
Configure users to create jobs
Configuring LDAP users
Deleting LDAP users
Configuring service account key tab to the machine user
Creating jobs
CDE example jobs and sample data
▶︎
Using Apache Iceberg
Prerequisites and limitations for using Iceberg
▶︎
Accessing Iceberg tables
Editing a storage handler policy to access Iceberg files on the file system
Creating a SQL policy to query an Iceberg table
Creating Virtual Cluster with Spark 3
Creating and running Spark 3.2.1 Iceberg jobs
Creating a new Iceberg table from Spark 3
Configuring Hive Metastore for Iceberg column changes
Importing and migrating Iceberg table in Spark 3
Importing and migrating Iceberg table format v2
Configuring Catalog
Loading data into an unpartitioned table
Querying data in an Iceberg table
Updating Iceberg table data
Iceberg library dependencies for Spark applications
Creating a Git repository in Cloudera Data Engineering (Technical Preview)
Managing jobs
Running jobs
Scheduling Jobs
Deleting Jobs
Best practices for building Apache Spark applications
▶︎
Creating and managing CDE Sessions
Creating Sessions in Cloudera Data Engineering
Interacting with a Session in Cloudera Data Engineering
Connecting Sessions with the CDE CLI
Viewing logs for Cloudera Data Engineering Sessions
▶︎
Accelerating CDE Jobs and Sessions using GPUs
▶︎
Accelerating CDE Jobs and Sessions using GPUs
Accelerating CDE Jobs and Sessions
Validating your Spark job or session run
▼
Orchestrating workflows and pipelines
Automating data pipelines using Apache Airflow
Creating an Airflow DAG using the Pipeline UI
▼
Managing an Airflow Pipeline using the CDE CLI
▼
Creating a pipeline using the CDE CLI
Creating a basic Airflow pipeline using CDE CLI
Creating a pipeline with additional Airflow configurations using CDE CLI
Creating an Airflow pipeline with custom files using CDE CLI [technical preview]
▶︎
Updating a pipeline using the CDE CLI
Updating a DAG file using the CDE CLI
Updating the Airflow job configurations using the CDE CLI
Updating the Airflow file mounts using the CDE CLI [technical preview]
Deleting an Airflow pipeline using the CDE CLI
Using CDE with an external Apache Airflow deployment
Supporting Airflow operators and hooks
▶︎
Creating a custom Airflow Python environment
▶︎
Creating a custom Airflow Python environment resource
CDE CLI custom Airflow Python environment flag reference
Uploading the resource to build the Python environment
Activating Python environment resources
Resetting to the default Airflow Python environment
Deleting Airflow Python environment resources
▶︎
Using CDE resources
▶︎
Using Python virtual environments
Creating a Python virtual environment resource
Using credentials for custom pip repositories
Associating a Python virtual environment with a CDE job
Updating Python virtual environment resources
Using Custom Spark Runtime Docker Images Via API/CLI
▶︎
Backing up and restoring CDE jobs
Backing up jobs on local storage
Restoring jobs
▶︎
Using spark-submit drop-in migration tool
▶︎
Using spark-submit drop-in migration tool
▶︎
Installing and using the migration tool
Downloading the cde-env tool
Installing the cde-env tool
▶︎
Configuring the cde-env tool
Prerequisites for setting up the cde-env tool
Adding profile for each user and creating the Credentials file
Using the cde-env tool
Run sample spark-submit command
▶︎
Using the migration tool in a docker container
▶︎
Configuring the cde-env tool
Prerequisites for setting up the cde-env tool
Adding profile for each user and creating the Credentials file
Run the migration tool in a docker container
Run sample spark-submit command inside the docker container
Known Issues and Limitations
▶︎
Monitoring and Troubleshooting CDE
Log files
Viewing Job run timeline
Spark history server
▶︎
Monitoring resources using Grafana dashboards
Connecting to Grafana dashboards
Accessing Grafana dashboards
Downloading diagnostic bundles for CDE
▶︎
Accessing CDE using the API
Using the CDE API
Getting an access token
Using an access token in API calls
▶︎
Managing job resources using the API
Creating a resource
Deleting a resource
▶︎
Using custom operators and libraries for Apache Airflow using API
Adding or updating custom operators and libraries using API
Deleting custom operators and libraries using API
▶︎
Troubleshooting custom operators and libraries using API
Viewing logs for custom operators and libraries using API
Cancel Maintenance using API
▶︎
Managing workload secrets using the API
Creating workload secrets
Listing workload secrets
Deleting workload secrets
Linking workload secrets with Spark Job definitions
Using the workload secret in the Spark application code
Creating a job using the API
Listing jobs using the API
Getting job info using the API
▶︎
Accessing CDE using the CLI
Using the CDE CLI
Downloading the CDE CLI
▶︎
Configuring the CLI client
Cloudera Data Engineering CLI configuration options
Creating and using multiple profile files
CDE CLI authentication
CDE CLI TLS configuration
CDE concepts
▶︎
Managing job resources
Creating a resource
Uploading content to a resource
Deleting a resource
Creating Docker credentials
Deleting Docker credentials
Deleting an Airflow DAG
▶︎
Managing jobs
Creating a Spark job using the CLI
Creating an Airflow job using the CLI
Listing jobs using the CLI
Submitting a Spark job using the CLI
Running raw Scala code
Submitting an Airflow job using the CLI
Running a Spark job using the CLI
Running a Airflow job using the CLI
▶︎
Scheduling Spark jobs
Enabling, disabling, and pausing scheduled jobs
Managing the status of scheduled job instances
▶︎
Managing workload secrets using the CLI
Creating workload secrets
Updating workload secrets
Linking a workload secrets
Using workload secrets
Listing workload secrets
Deleting workload secrets
▶︎
Managing Sessions in Cloudera Data Engineering using the CLI
Creating a Session using the CDE CLI [Technical Preview]
Interacting with a Session using the CDE CLI
Sessions example for the CDE CLI
Sessions command descriptions
CDE Spark job example
CDE CLI command reference
CDE CLI Spark flag reference
CDE CLI Airflow flag reference
CDE CLI list command syntax reference
Jobs REST API reference
Accelerating CDE Jobs and Sessions
Accelerating CDE Jobs and Sessions using GPUs
Accelerating CDE Jobs and Sessions using GPUs
Accessing CDE using the API
Accessing CDE using the CLI
Accessing Grafana dashboards
Accessing Grafana dashboards
Accessing Iceberg tables
Activating Python environment resources
Adding a Cloudera Data Engineering service
Adding or updating custom operators and libraries using API
Adding Ozone data connector for Cloudera Data Engineering service
Adding profile for each user and creating the Credentials file
Adding profile for each user and creating the Credentials file
Associating a Python virtual environment with a CDE job
Automating data pipelines using Apache Airflow
Automating data pipelines using Apache Airflow
Backing up and restoring CDE jobs
Backing up and Restoring Data Connectors
Backing up jobs on local storage
Backing up the data connector
Best practices for building Apache Spark applications
Cancel Maintenance using API
CDE CLI Airflow flag reference
CDE CLI authentication
CDE CLI command reference
CDE CLI custom Airflow Python environment flag reference
CDE CLI list command syntax reference
CDE CLI Spark flag reference
CDE CLI TLS configuration
CDE concepts
CDE example jobs and sample data
CDE Private Cloud prerequisites
CDE Spark job example
Cloudera Data Engineering CLI configuration options
Cloudera Data Engineering resources
Cloudera Data Engineering service
Compatibility for Cloudera Data Engineering and Runtime components
Configure users to create jobs
Configuring Catalog
Configuring Data Connectors
Configuring Hive Metastore for Iceberg column changes
Configuring LDAP users
Configuring service account key tab to the machine user
Configuring the cde-env tool
Configuring the cde-env tool
Configuring the CLI client
Connecting Sessions with the CDE CLI
Connecting to Grafana dashboards
Connecting to Grafana dashboards
Creating a basic Airflow pipeline using CDE CLI
Creating a custom Airflow Python environment
Creating a custom Airflow Python environment resource
Creating a Git repository in Cloudera Data Engineering (Technical Preview)
Creating a job using the API
Creating a new Iceberg table from Spark 3
Creating a pipeline using the CDE CLI
Creating a pipeline with additional Airflow configurations using CDE CLI
Creating a Python virtual environment resource
Creating a resource
Creating a resource
Creating a Session using the CDE CLI [Technical Preview]
Creating a Spark job using the CLI
Creating a SQL policy to query an Iceberg table
Creating an Airflow DAG using the Pipeline UI
Creating an Airflow job using the CLI
Creating an Airflow pipeline with custom files using CDE CLI [technical preview]
Creating and managing CDE jobs
Creating and managing CDE Sessions
Creating and managing virtual clusters
Creating and running Spark 3.2.1 Iceberg jobs
Creating and using multiple profile files
Creating Docker credentials
Creating jobs
Creating jobs
Creating Sessions in Cloudera Data Engineering
Creating Sessions in Cloudera Data Engineering
Creating Virtual Cluster with Spark 3
Creating Virtual Cluster without installing Atlas in your CDP Base cluster
Creating virtual clusters
Creating workload secrets
Creating workload secrets
Data Engineering
Data Engineering Overview
Data Engineering Prerequisites
Deleting a resource
Deleting a resource
Deleting Airflow Python environment resources
Deleting an Airflow DAG
Deleting an Airflow pipeline using the CDE CLI
Deleting custom operators and libraries using API
Deleting Docker credentials
Deleting Jobs
Deleting LDAP users
Deleting Ozone data connector for Cloudera Data Engineering service
Deleting virtual clusters
Deleting workload secrets
Deleting workload secrets
Downloading diagnostic bundles for CDE
Downloading the CDE CLI
Downloading the cde-env tool
Editing a storage handler policy to access Iceberg files on the file system
Enabling and disabling Cloudera Data Engineering
Enabling, disabling, and pausing scheduled jobs
Getting an access token
Getting job info using the API
GPU nodes setup
Iceberg library dependencies for Spark applications
Importing and migrating Iceberg table format v2
Importing and migrating Iceberg table in Spark 3
Initializing virtual clusters
Installing and using the migration tool
Installing the cde-env tool
Interacting with a Session in Cloudera Data Engineering
Interacting with a Session using the CDE CLI
Known issues
Known Issues and Limitations
Linking a workload secrets
Linking workload secrets with Spark Job definitions
Listing jobs using the API
Listing jobs using the CLI
Listing workload secrets
Listing workload secrets
Loading data into an unpartitioned table
Log files
Managing a CDE Service
Managing an Airflow Pipeline using the CDE CLI
Managing heterogenous GPU nodes
Managing job resources
Managing job resources using the API
Managing jobs
Managing jobs
Managing Sessions in Cloudera Data Engineering using the CLI
Managing the status of scheduled job instances
Managing virtual clusters
Managing workload secrets using the API
Managing workload secrets using the CLI
Monitoring and Troubleshooting CDE
Monitoring resources using Grafana dashboards
Monitoring resources using Grafana dashboards
Orchestrating workflows and pipelines
Prerequisites and limitations for using Iceberg
Prerequisites for setting up the cde-env tool
Prerequisites for setting up the cde-env tool
Querying data in an Iceberg table
Quota Management
Release Notes
Removing a Cloudera Data Engineering service
Resetting to the default Airflow Python environment
Restoring jobs
Run sample spark-submit command
Run sample spark-submit command inside the docker container
Run the migration tool in a docker container
Running a Airflow job using the CLI
Running a Spark job using the CLI
Running jobs
Running raw Scala code
Scheduling Jobs
Scheduling Spark jobs
Sessions command descriptions
Sessions example for the CDE CLI
Spark GPU Runtime Images
Spark history server
Submitting a Spark job using the CLI
Submitting an Airflow job using the CLI
Supporting Airflow operators and hooks
Testing GPU setup
Top Tasks
Troubleshooting custom operators and libraries using API
Updating a DAG file using the CDE CLI
Updating a pipeline using the CDE CLI
Updating Iceberg table data
Updating Python virtual environment resources
Updating the Airflow file mounts using the CDE CLI [technical preview]
Updating the Airflow job configurations using the CDE CLI
Updating workload secrets
Uploading content to a resource
Uploading the resource to build the Python environment
Using an access token in API calls
Using Apache Iceberg
Using CDE resources
Using CDE with an external Apache Airflow deployment
Using credentials for custom pip repositories
Using custom operators and libraries for Apache Airflow using API
Using Custom Spark Runtime Docker Images Via API/CLI
Using GPUs in Cloudera Data Engineering (Technical Preview)
Using Ozone storage with CDE
Using Python virtual environments
Using spark-submit drop-in migration tool
Using spark-submit drop-in migration tool
Using the CDE API
Using the CDE CLI
Using the cde-env tool
Using the migration tool in a docker container
Using the workload secret in the Spark application code
Using workload secrets
Validating your Spark job or session run
Viewing Job run timeline
Viewing logs for Cloudera Data Engineering Sessions
Viewing logs for custom operators and libraries using API
What's new
«
Creating a pipeline using the CDE CLI
Automating data pipelines using Apache Airflow
Creating an Airflow DAG using the Pipeline UI
▼
Managing an Airflow Pipeline using the CDE CLI
▼
Creating a pipeline using the CDE CLI
Creating a basic Airflow pipeline using CDE CLI
Creating a pipeline with additional Airflow configurations using CDE CLI
Creating an Airflow pipeline with custom files using CDE CLI [technical preview]
▶︎
Updating a pipeline using the CDE CLI
Updating a DAG file using the CDE CLI
Updating the Airflow job configurations using the CDE CLI
Updating the Airflow file mounts using the CDE CLI [technical preview]
Deleting an Airflow pipeline using the CDE CLI
Using CDE with an external Apache Airflow deployment
Supporting Airflow operators and hooks
▶︎
Creating a custom Airflow Python environment
▶︎
Creating a custom Airflow Python environment resource
CDE CLI custom Airflow Python environment flag reference
Uploading the resource to build the Python environment
Activating Python environment resources
Resetting to the default Airflow Python environment
Deleting Airflow Python environment resources
»
Orchestrating workflows and pipelines
Creating a pipeline using the CDE CLI
You can update the following properties in an Airflow pipeline:
Related information
Using the Cloudera Data Engineering CLI
Creating a basic Airflow pipeline using CDE CLI
By creating a basic pipeline in Cloudera Data Engineering (CDE) using the CLI, you can create multi-step pipelines with a combination of available operators.
Creating a pipeline with additional Airflow configurations using CDE CLI
By creating a pipeline with additional Airflow configurations using the Cloudera Data Engineering (CDE) CLI, you can create multi-step pipelines with a combination of available operators. There are two ways to create this type of pipeline. The first method detailed below is recommended approach that we highly suggest customers use. The second is the alternative method that customers have used in the past, but is not recommended.
Creating an Airflow pipeline with custom files using CDE CLI [technical preview]
By creating a pipeline in CDE using the CLI, you can add custom files that are available for tasks. This is a technical preview.
Parent topic:
Managing an Airflow Pipeline using the CDE CLI
Feedback
We want your opinion
How can we improve this page?
What kind of feedback do you have?
I like something
I have an idea
Something's not working
Can we contact you for follow-up on this?
Back
Submit
OK
1.5
1.5.4
1.5.3
1.5.2
1.5.1
1.5.0
1.4
1.4.1
1.4.0