Creating an ad-hoc job in Cloudera Data Engineering

Ad-hoc runs mimic the behavior of the traditional spark-submit or a single execution of an Airflow DAG, where the job runs once. These runs will not establish a permanent job definition. You can use the ad-hoc job runs for log analysis and future reference.

Before you begin

  • Ensure that you have a Virtual Cluster that is ready to use.
  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The Home page displays.
  2. Click See More under Deploy and select Ad-Hoc Run. The Create an Ad-Hoc Spark Job dialog box is displayed.
  3. Select a Virtual Cluster.
  4. Enter a Job Name.
  5. In the Select Application Files drop-down list, select Resource and upload or select the resource or URL and enter the URL that contains the file or Repository and select a file from the repository and the file will be automatically added to the job.
  6. Enter a Main Class.
  7. Enter Arguments and Configurations.
  8. Select a Python Environment.
  9. Select a Data Connector.
Steps for advanced options

You can upload additional files, customize the number of executors, drivers, executor cores, and memory.

  1. Upload files and resources.
  2. Configure Compute options.
  3. Set an option for Log Level.
  4. Click Enable GPU Accelerations checkbox to enable the GPU acceleration and configure selectors and tolerations if you want to run the job on specific GPU nodes.
  5. Click Create and Run.
  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The Home page displays.
  2. In the Jobs section under Airflow, and click Ad-hoc Run.
  3. Select a Virtual Cluster.
  4. Enter a Job Name.
  5. Upload a DAG file.
  6. Click Create and Run.