Known issues and limitations in Cloudera Data Engineering on CDP Private Cloud

This page lists the current known issues and limitations that you might run into while using the Cloudera Data Engineering (CDE) service.

DEX-8682: CDE PvC 1.5.0 : CDP upgrade to 1.5.0 with OCP upgrade (4.8 to 4.10) Jobs UI is not opening
Upgrading the OCP version from 4.8 to 4.10 while CDE service is enabled, causes the Jobs UI to not open. This is due to OCP 4.10 upgrading to the Kubernetes version 1.23 which removes the old ingress APIs used.
Back up CDE jobs in the CDE virtual cluster, and then delete the CDE service and CDE virtual cluster. Restore it after the upgrade. For more information about backup and restore CDE jobs, see Backing up and restoring CDE jobs.
DOCS-17844: Logs are lost if the log lines are longer than 50000 characters in fluentd

This issue occurs when the Buffer_Chunk_Size parameter for the fluent-bit is set to a value that is lesser than the size of the log line.

The values that are currently set are:
Buffer_Chunk_Size=50000
            Buffer_Max_Size=50000
When required, you can set higher values for these parameters in the fluent-bit configuration map which is present in the dex-app-xxxx namespace.
DEX-8614: Sometimes Spark job is not getting killed even though its parent Airflow job gets killed
Sometimes if an issue is encountered while sending the request to kill a spark batch to the Livy API and the error is logged but not propagated properly to the Airflow job. In such cases, the underlying spark job might still be running, though the airflow job considers that the job is killed successfully.
Kill the spark job manually using CDE user interface, CLI, or API.
DEX-9237: Job fails with the “Permission Denied” error after updating the virtual cluster resource quota
Whenever the virtual cluster resource quota is updated, the newly launched jobs on the virtual cluster fail with the Permission Denied error. This error can be seen in various stages of the job life cycle, in submitters, drivers or executors, and Airflow workers.
Restart all the virtual cluster pods manually every time you update the virtual cluster resource quota.
DEX-8601: ECS 1.4.x to 1.5.0 Upgrade: jobs fail after upgrade
Upgrading the ECS version while CDE service is enabled, causes the jobs launched in the old CDE virtual cluster fail. This is due to ECS upgrading to the kubernetes version 1.23 which removes the old ingress APIs used.
Back up CDE jobs in the CDE virtual cluster, and then delete the CDE service and CDE virtual cluster. Restore it after the upgrade. For more information about backup and restore CDE jobs, see Backing up and restoring CDE jobs.
DEX-8600: ECS 1.4.x to 1.5.0 Upgrade: Virtual cluster creation and deletion fails
Upgrading the ECS version while CDE service is enabled, causes the old CDE service and virtual cluster creation and deletion to fail. This is due to ECS upgrading to the kubernetes version 1.23 which removes the old ingress APIs used.
Back up CDE jobs in the CDE virtual cluster, and then delete the CDE service and CDE virtual cluster. Restore it after the upgrade. For more information about backup and restore CDE jobs, see Backing up and restoring CDE jobs.
DEX-8226: Grafana Charts of new virtual clusters will not be accessible on upgraded clusters if virtual clusters are created on existing CDE service.
If you upgrade the cluster from 1.3.4 to 1.4.x and create a new virtual clusters on the existing CDE Service, Grafana Charts will not be displayed. This is due to broken APIs.
Create a new CDE Service and a new virtual cluster on that service. Grafana Charts of the virtual cluster will be displayed.
DEX-7000: Parallel Airflow tasks triggered at exactly same time by the user throws the 401:Unauthorized error.
Error 401:Unauthorized causes airflow jobs to fail intermittently, when parallel Airflow tasks using CDEJobRunOperator are triggered at the exact same time in an Airflow DAG.
Using the below steps, create a workaround bashoperator job which will prevent this error from occurring. This job will keep running indefinitely as part of the workaround and should not be killed.
  1. Navigate to the Cloudera Data Engineering Overview page by clicking the Data Engineering tile in the Cloudera Data Platform (CDP) console.
  2. In the CDE Services column, select the service containing the virtual cluster where you want to create the job.
  3. In the Virtual Clusters column on the right, click the View Jobs icon on the virtual cluster where you want to create the job.
  4. In the left hand menu, click Jobs.
  5. Click Create Job.
  6. Provide the job details:
    1. Select Airflow for the job type.
    2. Specify the job name as bashoperator-job.
    3. Save the following python script to attach it as a DAG file.
      from dateutil import parser
      from airflow import DAG
      from airflow.utils import timezone
      from airflow.operators.bash_operator import BashOperator
      default_args = {
         'depends_on_past': False,
      }
      with DAG(
         'bashoperator-job',
         default_args = default_args,
         start_date = parser.isoparse('2022-06-17T23:52:00.123Z').replace(tzinfo=timezone.utc),
         schedule_interval = None,
         is_paused_upon_creation = False
         ) as dag:
          [ BashOperator(task_id = 'task1', bash_command = 'sleep infinity'),
          BashOperator(task_id = 'task2', bash_command = 'sleep infinity') ]
    4. Select File, click Select a file to upload the above python, and select a file from an existing resource.
  7. Select the Python Version, and optionally select a Python Environment.
  8. Click Create and Run.
DEX-7001: When Airflow jobs are run, the privileges of the user who created the job is applied and not the user who submitted the job.
Irrespective of who submits the Airflow job, the Airflow job is run with the user privileges who created the job. This causes issues when the job submitter has lesser privileges than the job owner who has higher privileges.
Spark and Airflow jobs must be created and run by the same user.
Changing LDAP configuration after installing CDE breaks authentication
If you change the LDAP configuration after installing CDE, as described in Configuring LDAP authentication for CDP Private Cloud, authentication no longer works.
Re-install CDE after making any necessary changes to the LDAP configuration.
HDFS is the default filesystem for all resource mounts
For any jobs that use local filesystem paths as arguments to a Spark job, explicitly specify file:// as the scheme. For example, if your job uses a mounted resource called test-resource.txt, in the job definition, you would typically refer to it as /app/mount/test-resource.txt. In CDP Private Cloud, this should be specified as file:///app/mount/test-resource.txt.
Apache Ozone is supported only for log files
Apache Ozone is supported only for log files. It is not supported for job configurations, resources, and so on.
Scheduling jobs with URL references does not work
Scheduling a job that specifies a URL reference does not work.
Use a file reference or create a resource and specify it

Limitations

Access key-based authentication will not be enabled in upgraded clusters prior to CDP PVC 1.3.4 release.
After you upgrade to PVC 1.3.4 version from earlier versions, you must create the CDE Base service and Virtual Cluster again to use the new Access Key feature. Otherwise, the Access Key feature will not be supported in the CDE Base service created prior to the 1.3.4 upgrade.