Using the Cloudera Data Engineering CLI

If at any time you are having issues with the CDE CLI, you can view the CDE CLI options by adding the --help flag to any CLI commands:

cde spark --help
 
cde spark submit --help
 
cde airflow --help
 
cde resource --help
When new to the CDE CLI, a common approach is to start with the following steps:
  1. Experimenting with CDE Spark Submit CLI
  2. Creating a CDE Resource
  3. Uploading all files to the CDE Resource
  4. Creating CDE jobs with files uploaded to the Resource
  5. Running CDE jobs

Step 1: Experimenting with CDE Spark Submits

This is the fastest way to launch a Spark Submit CLI in CDE. Notice however that the CDE job is not instantiated as a Spark CDE job and is therefore not reschedulable from the CDE UI.
cde spark submit pysparkjob.py

Step 2: Creating a CDE Resource

Cloudera recommends that you create one CDE Resource for every Spark Pipeline or Airflow DAG .
cde resource create --name cde_cli_resource

Step 3: Uploading all files to the CDE Resource

When uploading to a resource the two important inputs are the name of the target CDE Resource and the local path to the files being uploaded.

Cloudera recommends using the --help command to explore more options such as uploading files in bulk.
cde resource upload --name cde_cli_resource --local-path "pysparkjob.py" --resource-path "pysparkjob.py"

Step 4: Creating CDE jobs with files uploaded to the Resource

Once the files and dependencies have been uploaded you can easily instantiate a CDE job with the job create command.

For example, you can create a CDE job with the CDE Resource file and run it on a schedule.
cde job create --name "cde_cli_job" --type "spark"
                --application-file "pysparkjob.py" 
                --cron-expression "0 */1 * * *" \
                --schedule-enabled "true" 
                --schedule-start "2022-04-29" 
                --schedule-end "2022-05-02" 
                --mount-1-resource "cde_cli_resource"

Step 5: Running CDE jobs

Creating a resource and uploading dependencies is optional. Once that is done, you can trigger execution of the CDE jobs manually.
cde job run --name "cde_cli_job" --application-file "pysparkjob.py"  

You have now completed a basic workflow to start experimenting with the CDE CLI. Below are some more useful examples:

More CDE CLI examples

  • Search for CDE jobs based on attributes
    You can use attributes for your search. In this case, you can search by name.
    cde job list --filter 'name[like]%name_pattern%'
  • List all CDE job runs
    cde run list
  • Describe CDE job run
    Replace the integer with your job run id. For example, 47 is the ID referred in the below command.
    cde run describe --id 47
  • Create a CDE job with Custom Spark Log Level

    A big advantage of using CDE is Spark observability. Logging level can be easily customized. Furthermore, every log is always available to the CDE user.

    Using the log-level parameter you can choose any of the following options: TRACE, DEBUG, INFO, WARN, ERROR, FATAL, OFF
    cde job create --name "cde_cli_job_custom_log_level" --type "spark"
                    --application-file "pysparkjob.py"
                    --log-level "DEBUG"
                    --schedule-enabled "false" 
                    --mount-1-resource "cde_cli_resource"
  • Collect CDE Job Run Logs
    You can download the Spark logs you have access to in CDE. Notice you have more options e.g. executor logs
    cde run logs --type "driver/stdout" --id 47

    You can modify the log type to any of the available tabs in the corresponding CDE job Run page. For example:

    • driver/stderr or Driver/stdout
    • executor id/stdout