Creating and running Spark 3.2.1 Iceberg jobs

Create and run a spark job which uses iceberg tables.

In Cloudera Data Engineering (CDE), jobs are associated with virtual clusters. Before you can create a job, you must create a virtual cluster that can run it. For more information, see Creating virtual clusters.

  1. In the Cloudera Data Platform (CDP) console, click the Data Engineering tile. The CDE Home page displays.
  2. In the CDE Home page, in Jobs, click Create New under Spark or click Jobs in the left navigation menu and then click Create Job.
  3. Provide the Job Details:
    1. Select Spark for the job type. If you are creating the job from the Home page, select the virtual cluster where you want to create the job.
    2. Specify the Name.
    3. Select File or URL for your application file, and provide or specify the file. You can upload a new file or select a file from an existing resource.
    4. If your application code is a JAR file, specify the Main Class.
    5. Click the Add Configuration button to add the following configuration parameters:
      config_key: cde.iceberg.enabled 
      config_value: true
    6. If your application code is a Python file, select the Python Version, and optionally select a Python Environment.
  4. Click Advanced Configurations to display more customizations, such as additional files, initial executors, executor range, driver and executor cores, and memory.
    By default, the executor range is set to match the range of CPU cores configured for the virtual cluster. This improves resource utilization and efficiency by allowing jobs to scale up to the maximum virtual cluster resources available, without manually tuning and optimizing the number of executors per job.
  5. Click Schedule to display scheduling options.
    You can schedule the application to run periodically using the Basic controls or by specifying a Cron Expression.
  6. Click Alerts and provide the email id to receive alerts. Click + to add more email IDs. Optionally, you can select when you want email alerts whether for job failures or missed job service-level agreements or both.
  7. If you provided a schedule, click Schedule to create the job. If you did not specify a schedule, and you do not want the job to run immediately, click the drop-down arrow on Create and Run and select Create. Otherwise, click Create and Run to run the job immediately.