Known issues and limitations in Cloudera Data Engineering on CDP Private Cloud
This page lists the current known issues and limitations that you might run into while using the Cloudera Data Engineering (CDE) service.
- DEX-8542: Newly created Iceberg tables are owned by "sparkuser"
- The Iceberg tables created in CDE using Spark 3.2.3 are being
displayed as owned by the "sparkuser" user. The Iceberg tables must be owned by the user
who created them. For
example,
hive=> SELECT "TBL_NAME", "OWNER" FROM "TBLS" WHERE "TBL_NAME"='iceberg_test'; TBL_NAME | OWNER ------------{}+{}--------- iceberg_test | sparkuser
- DEX-14676: Deep Analysis is not working in CDE PvC under analysis tab
- If you are using Spark version 2.x for running your jobs, then the Run Deep Analysis feature present under the Analysis tab is not supported on Cloudera Data Engineering Private Cloud.
- DEX-12150: Recursive search for a file in resources is not working
- If you search for any file using the Search field in the Resources page, the result does not display any files present with that name inside the resources.
- DEX-8540: Job Analysis tab is not working
- When you access the Analysis tab fails to load data for Spark 2. tab through the Cloudera Data Engineering UI, the
- DEX-11300: Editing the configuration of a job created using a GIT repository shows Resources instead of Repository
-
Jobs which use application file from Repositories when edited, shows Resources as a source under Select application file. This issue does not affect the functionality of the job but could confuse as it displays the source as a Resource for the application even if the selected file is from a repository. Though it would show Resource in this case, in the backend it is selected from the chosen repository.
- DEX-11508: The file modification time in the Modified column is not updated on the Repositories page
-
The GIT Repository modified time does not show the correct modified time after syncing the files. When you click Sync Repository, the syncing of files is done successfully even if the modified time shown is old.
- DEX-11583: Updating the log retention time to a decimal value using API shows unclear message
-
When you update the log retention time with a decimal value or an incorrect value using API, an error message similar to the following is displayed:
* Connection #0 to host console-cde-bt5xp3.apps.shared-os-qe-01.kcloud.example.com left intact {"status":"error","message":"json: cannot unmarshal number 1.5 into Go struct field AppInstanceLogRetentionType.config.logRetention.retentionPeriod of type int"}%
- DEX-11340: Sessions go to unknown state if you start the CDE upgrade process before killing live Sessions
-
If spark sessions are running during the CDE upgrade then they are not be automatically killed which can leave them in an unknown state during and after the upgrade.
- DEX-10939: Running the prepare-for-upgrade command puts the workload side database into read-only mode
-
Running the prepare-for-upgrade command puts the workload side database into read-only mode. If you try to edit any resources or jobs or run jobs in any virtual cluster under the CDE service for which the prepare-for-upgrade command was executed,
The MySQL server is running with the --read-only option so it cannot execute this statement
error is displayed.This means that all the APIs that perform write operations will fail for all virtual clusters. This is done to ensure that no changes are done to the data in the cluster after the prepare-for-upgrade command is executed, so that the new restored cluster is consistent with the old version.
- DOCS-17844: Logs are lost if the log lines are longer than 50000 characters in fluentd
-
This issue occurs when the Buffer_Chunk_Size parameter for the fluent-bit is set to a value that is lesser than the size of the log line.
- DOCS-18585: Changes to the log retention configuration in the existing virtual cluster do not reflect the new configuration
-
When you edit the log retention policy configuration for an existing virtual cluster, the configuration changes are not applied.
- DEX-11231: In OpenShift, the Spark 3.3 virtual cluster creation fails due to Airflow pods crashing
-
This is an intermittent issue during virtual cluster installation in the OCP cluster where the airflow-scheduler and airflow-webserver pods are stuck in the CrashLoopBackOff state. This leads to virtual cluster installation failure.
- DEX-10576: Builder job does not start automatically when the resource is restored from an archive
-
For the airflow python environment resource, the restoration does not work as intended. Though the resource is restored, the build process is not triggered. Even if the resource was activated during backup, it is not reactivated automatically. This leads to job failure during restoration or creation, if there is a dependency on this resource.
- DEX-10147: Grafana issue if the same VC name is used under different CDE services which share same environment
- In CDE 1.5.1, when you have two different CDE services with the same name under the same environment, and you click the Grafana charts for the second CDE service, metrics for the Virtual Cluster in the first CDE service will display.
- DEX-10116: Virtual Cluster installation fails when Ozone S3 Gateway proxy is enabled
- Virtual Cluster installation fails when Ozone S3 gateway proxy is enabled. Ozone s3 gateway proxy gets enabled when more than one Ozone S3 Gateway is configured in the CDP Private Cloud Base cluster.
- DEX-10052: Logs are not available for python environment resource builder in CDP Private Cloud
- When creating a python environment resource and uploading the requirements.txt file, the python environment is built using a k8s job that runs in the cluster. These logs cannot be viewed currently for debugging purposes using CDE CLI or UI. However, you can view the events of the job.
- DEX-10051: Spark sessions is hung at the
Preparing
state if started without running thecde-utils.sh
script - You might run into an issue when creating a spark session without initialising the CDE virtual cluster and the UI might hang in a Preparing state.
- DEX-9783: While creating the new VC, it shows wrong CPU and Memory values
- When clicking on the Virtual Cluster details for a Virtual Cluster that is in the Installing state, the configured CPU and Memory values that are displayed are inaccurate for until the VC is created.
- DEX-9692: CDE UI does not work when the port 80 is blocked on the k8s cluster
- If your environment has blocked port 80 at the ingress level. Then the CDE UI does not work
- DEX-11294: Spark UI does not render in CDE UI
- Spark UI will not work for job runs that are using a Git repository as a resource.
- DEX-9961: CDE Service installation is failing when retrieving aws_key_id
- CDE Service installation is failing when retrieving
aws_key_id with the
Could not add shared cluster overrides, error: unable to retrieve aws_key_id from the env service
error. - DEX-8996: CDE service stuck at the initialising state when a user who does not have correct permission tries to create it
- When a CDE user tries to create a CDE service, it gets stuck at the initializing state and does not fail. Additionally, cleanup cannot be done from the UI and must be done on the backend.
- DEX-8682: CDE PvC 1.5.0 : CDP upgrade to 1.5.0 with OCP upgrade (4.8 to 4.10) Jobs UI is not opening
- Upgrading the OCP version from 4.8 to 4.10 while CDE service is enabled, causes the Jobs UI to not open. This is due to OCP 4.10 upgrading to the Kubernetes version 1.23 which removes the old ingress APIs used.
- DEX-8614: Sometimes Spark job is not getting killed even though its parent Airflow job gets killed
- Sometimes if an issue is encountered while sending the request to kill a spark batch to the Livy API and the error is logged but not propagated properly to the Airflow job. In such cases, the underlying spark job might still be running, though the airflow job considers that the job is killed successfully.
- DEX-8601: ECS 1.4.x to 1.5.0 Upgrade: jobs fail after upgrade
- Upgrading the ECS version while CDE service is enabled, causes the jobs launched in the old CDE virtual cluster fail. This is due to ECS upgrading to the kubernetes version 1.23 which removes the old ingress APIs used.
- DEX-8226: Grafana Charts of new virtual clusters will not be accessible on upgraded clusters if virtual clusters are created on existing CDE service
- If you upgrade the cluster from 1.3.4 to 1.4.x and create a new virtual clusters on the existing CDE Service, Grafana Charts will not be displayed. This is due to broken APIs.
- DEX-7000: Parallel Airflow tasks triggered at exactly same time by the user throws the 401:Unauthorized error
- Error
401:Unauthorized
causes airflow jobs to fail intermittently, when parallel Airflow tasks usingCDEJobRunOperator
are triggered at the exact same time in an Airflow DAG. - DEX-7001: When Airflow jobs are run, the privileges of the user who created the job is applied and not the user who submitted the job
- Irrespective of who submits the Airflow job, the Airflow job is run with the user privileges who created the job. This causes issues when the job submitter has lesser privileges than the job owner who has higher privileges.
- Changing LDAP configuration after installing CDE breaks authentication
- If you change the LDAP configuration after installing CDE, as described in Configuring LDAP authentication for CDP Private Cloud, authentication no longer works.
- HDFS is the default filesystem for all resource mounts
- For any jobs that use local filesystem paths as
arguments to a Spark job, explicitly specify
file://
as the scheme. For example, if your job uses a mounted resource calledtest-resource.txt
, in the job definition, you would typically refer to it as/app/mount/test-resource.txt
. In CDP Private Cloud, this should be specified asfile:///app/mount/test-resource.txt
. - Scheduling jobs with URL references does not work
- Scheduling a job that specifies a URL reference does not work.
Limitations
- Access key-based authentication will not be enabled in upgraded clusters prior to CDP PVC 1.3.4 release
- After you upgrade to PVC 1.3.4 version from earlier versions, you must create the CDE Base service and Virtual Cluster again to use the new Access Key feature. Otherwise, the Access Key feature will not be supported in the CDE Base service created prior to the 1.3.4 upgrade.