Cloudera Data Engineering known issues archive
Learn about the archived known issues of the Cloudera Data Engineering service, the impact or changes to the functionality, and the workaround.
- DEX-14094: New pipeline editor can create a cde_job step of its own pipeline job which causes recursion and looping
-
If you add a Cloudera Data Engineering Job step in the pipeline editor and select the same job as the pipeline job from the Select Job drop-down list while configuring the pipeline job using the editor, then running the pipeline job results in a recursive loop.
For example, You have created a pipeline job named test-dag and selected the same job test-dag from the Select Job drop-down list while adding the Cloudera Data Engineering Job step, then running the pipeline job results in a recursive loop.
- DEX-13465: Cloudera Data Engineering Airflow DAG Code Fetch is not working
- The embedded Airflow UI within the Cloudera Data Engineering Job pages does not correctly show the "Code" view.
- DEX-14067: Email alerting is not sending email to the recipient even after the job fails
- When a Virtual Cluster is created through the UI, SMTP configuration and authentication is empty. Enabling Email Alerting after VC creation allows users to add the SMTP parameters, but the updated password is not fetched when sending email alerts.
- DEX-14027: Spark 3.5.1 jobs are failing with error 'org.apache.hadoop.fs.s3a.impl.InstantiationIOException'
- Spark 3.5.1 RAZ-enabled Cloudera Data Engineering clusters fail to initialize
RAZ S3 plugin library due to recent backward incompatible changes in the library, and jobs
fail with error
org.apache.hadoop.fs.s3a.impl.InstantiationIOException.
- DEX-13975: 'spark.catalog.listTables()' command in job is failing in Python Spark for Spark 3.5.1
- Using
catalog.listTables()with Iceberg tables results in an exceptionorg.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near end of input.
- DEX-12630: Cloudera Data Engineering Service failed to run jobs on SSD Based clusters
- When a customer creates a new Cloudera Data Engineering service with an SSD Instance enabled on Cloudera Data Engineering version greater than or equal to 1.19.4, Spark and Airflow jobs do not start at all. The same problem happens if an existing Cloudera Data Engineering service is upgraded to 1.19.3 or greater and has SSD Instance enabled.
- DEX 12451: Service creation fails when "Enable Public Loadbalancer" is selected in an environment with only private subnets
- When creating a service with the Enable Public Load Balancer option selected, the
service creation fails with the following error:
“CDE Service: 1.20.3-b15 ENV: dsp-storage-mow-priv (in mow-priv) and dsp-storage-aws-dev-newvpc (mw-dev) – Environment is configured only with private subnets, there are no public subnets. dex-base installation failed, events: [\{"ResourceKind":"Service","ResourceName":"dex-base-nginx-56g288bq-controller","ResourceNamespace":"dex-base-56g288bq","Message":"Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB","Timestamp":"2024-02-08T09:55:28Z"}]
- ENGESC-22921: Livy pod failure (CrashLoopBackoff)
- There is not enough Livy-overhead-memory causing the Livy service to crash and trigger the pod to restart.
- DEX-11086: Cancelled statement(s) not canceled by Livy
- Currently, Livy statements cant be cancelled immediately after using
/sessions/{name}/statements/{id}/cancel. The status is returned asCancelledbut the background job continues to run.
- DEX-9939: Tier 2 Node groups are created with the same size as Tier 1 node groups during service creation. They cannot be edited during service edit
- If a service is created with 1 as the minimum on-demand scale limit, two nodes will run for Tier 1 and Tier 2. Even if a service is edited with the minimum reduced to 0, the Tier 2 node will still run. This will be fixed in the Cloudera Data Engineering 1.20 release.
- DEX-9852: FreeIPA certificate mismatch issues for new Spark 3.3 Virtual Clusters
- In Cloudera Data Engineering 1.19, when creating a new Virtual Cluster based on Spark 3.3, and submitting any job in the pods, the following error occurs: "start TGT gen failed for user : rpc error: code = Unavailable desc = Operation not allowed while app is in recovery state."
- DEX-9932: Name length causes Pod creation error for Cloudera Data Engineering Sessions
- In Cloudera Data Engineering 1.19, the K8s pod name has a limitation of 63 Characters, and Cloudera Data Engineering Sessions has a name length of 56 maximum characters.
- DEX-9895: Cloudera Data Engineering Virtual Cluster API response displays default Spark version as 2.4.7
- In Cloudera Data Engineering 1.19, the Spark version 3.2.3 is the expected default in a Cloudera Data Engineering Spark Virtual Cluster, but Spark 2.4.7 displays instead. This issue will be fixed in Cloudera Data Engineering 1.20.
- DEX-9790: Single tab Session support for Virtual Cluster selection
- In Cloudera Data Engineering 1.19, the Virtual Cluster selection in the Jobs, Resources, Runs, and Sessions page is not preserved if the user attempts to open Cloudera Data Engineering in another browser tab/window.
- DEX-10044: Handle adding tier 2 auto scaling groups during in-place upgrades
- Since auto scaling groups (ASGs) are not added or updated during the upgrade, the tier 2 ASGs are not created. This resulted in pods that were unable to be scheduled. This error applies to services created in Cloudera Data Engineering 1.18 and then upgraded to 1.19.
- DEX-10107: Spark 3.3 in Cloudera Data Engineering 1.19 has a limitation of characters for job names
- Jobs with longer names over 23 characters can fail in Spark 3.3 with the following
exception:
23-05-14 10:14:16 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop due to IllegalArgumentException java.lang.IllegalArgumentException: '$JOB_NAME' in spark.kubernetes.executor.podNamePrefix is invalid. must conform https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names and the value length <= 47
- DEX-10055: Interacting with a killed session
- When you interact with a long-running killed Spark session, the session might become unresponsive. Refrain from interacting with the long-running killed session. This will be fixed in a future release of Cloudera Data Engineering.
- DEX-8769: The table entity type on Atlas is spar_tables instead of hive_tables on Spark3 Virtual Clusters
- Tables that are created using a Spark3 Virtual Cluster on an AWS setup will have spark_tables type instead of hive_tables on Atlas Entities.
- DEX-8774: Job and run cloning is not fully supported in Cloudera Data Engineering 1.17 through 1.18.1
- Currently, cloning job and runs are not supported in Cloudera Data Engineering 1.17 through 1.18.1.
- DEX-8515: The Spark History Server user interface is not visible in Cloudera Data Engineering
- During job execution in Cloudera Data Engineering 1.18, the Spark History Server user interface is not visible. This error will be fixed in Cloudera Data Engineering 1.18.1.
- DEX-6163: Error message with Spark 3.2 and Cloudera Data Engineering
- For Cloudera Data Engineering 1.16 through 1.18, if you experience an error message, "Service account may have been revoked" with Spark 3.2 and Cloudera Data Engineering, note that this is not the core issue despite what the error message states. Look for other exceptions as it is a harmless error and only displays after a job fails due to another reason. The error message displays as part of the shutdown process. This issue will be fixed in Cloudera Data Engineering 1.18.1.
- DEX-7653: Updating Airflow Job/Dag file throws a 404 error
- A 404 error occurs when you update an Airflow Job/Dag file with a modified DAG ID or
name when you initiate the following steps:
- Create an Airflow job using a Simple Dag file. Use the Create Only option.
- Edit the Airflow Job and delete the existing DAG file.
- Upload the same DAG file with a modified DAG ID and Name from it's content.
- Choose a different Resource Folder.
- Use the Update and Run option.
The 404 error occurs.
To avoid this issue, ensure that you do not modify the DAG ID in step 3. If you must change your DAG ID in the dag file, then create a new file.
- CDPD-40396 Iceberg migration fails on partitioned Hive table created by Spark without location
- Iceberg provides a
migrateprocedure to migrate a Parquet/ORC/Avro Hive table to Iceberg. If the table was created using Spark and the location is not specified, and is partitioned, the migration fails.
- DEX-5857 Persist job owner across Cloudera Data Engineering backup restores
- Currently, the user who runs the
cde backup restorecommand has permissions, by default, to run the Jobs. This may cause Cloudera Data Engineering jobs to fail if the workload user differs from the user who runs the Jobs on Source Cloudera Data Engineering Service where the backup was performed. This failure is due to the Workload User having different privileges as the user who is expected to run the job.
- DEX-7483 User interface bug for in-place upgrade (Tech Preview)
- The user interface incorrectly states that the Data Lake version 7.2.15 and above is required. The correct minimum version is 7.2.14.
- DEX-6873 Kubernetes 1.21 will fail service account token renewal after 90 days
- Cloudera Data Engineering on AWS running version Cloudera Data Engineering 1.14 through 1.16 using Kubernetes 1.21 will observe failed jobs after 90 days of service uptime.
- DEX-7286 In place upgrade (Technical Preview) issue: Certificate expired showing error in browser
- Certificates failure after an in-place upgrade from 1.14.
- CDPD-40396 Iceberg migration fails on partitioned Hive table created by Spark without location
- Iceberg provides a migrate procedure for migrating a Parquet/ORC/Avro Hive table to Iceberg. If the table was created using Spark without specifying location and is partitioned, the migration fails.
- COMPX-5494: Yunikorn recovery intermittently deletes existing placeholders
- On recovery, Yunikorn may intermittently delete placeholder pods. After recovery, there may be remaining placeholder pods. This may cause unexpected behavior during rescheduling.
- DWX-8257: Cloudera Data Warehouse Airflow Operator does not support SSO
-
Although Virtual Warehouse (VW) in Cloudera Data Warehouse supports SSO, this is incompatible with the Cloudera Data Engineering Airflow service as, for the time being, the Airflow Cloudera Data Warehouse Operator only supports workload username/password authentication.
- COMPX-7085: Scheduler crashes due to Out Of Memory (OOM) error in case of clusters with more than 200 nodes
-
Resource requirement of the YuniKorn scheduler pod depends on cluster size, that is, the number of nodes and the number of pods. Currently, the scheduler is configured with a memory limit of 2Gi. When running on a cluster that has more than 200 nodes, the memory limit of 2Gi may not be enough. This can cause the scheduler to crash because of OOM.
- DEX-3997: Python jobs using virtual environment fail with import error
- Running a Python job that uses a virtual environment resource fails with an import error, such as:
