Known issues and limitations in Cloudera Data Engineering on CDP Private Cloud

This page lists the current known issues and limitations that you might run into while using the Cloudera Data Engineering (CDE) service.

DEX-6998: Airflow launched Spark job logs are unavailable.
Spark job logs which were launched from airflow may not be available for viewing or downloading.
Workaround: None
DEX-7000: Parallel Airflow tasks triggered at exactly same time by the user throws the 401:Unauthorized error.
Error 401:Unauthorized is displayed in the logs when parallel Airflow tasks in an airflow job are triggered or launched exactly at the same time by the user.
Workaround: None
DEX-7001: When Airflow jobs are run, the privileges of the user who created the job is applied and not the user who submitted the job.
Irrespective of who submits the Airflow job, the Airflow job is run with the user privileges who created the job. This causes issues when the job submitter has lesser privileges than the job owner who has higher privileges.
Workaround: Spark and Airflow jobs must be created and run by the same user.
DEX-7022: Virtual Cluster does not accept spark or airflow jobs if the tzinfo library is used as the start date.
If you use the tzinfo library for start_date, then the Virtual Cluster may not complete execution of spark or airflow jobs launched later. For example:
example_dag = DAG(
'bashoperator-parameter-job',
default_args=default_args,
start_date=parser.isoparse("2020-11-11T20:20:04.268Z").replace(tzinfo=timezone.utc),
schedule_interval='@once',
is_paused_upon_creation=False
)
Workaround: Use start_date as start_date=pendulum.datetime(2017, 1, 1, tz="UTC”) instead of code like the tzinfo library. For more information about time zones, see Airflow time zone aware DAGs documention.
Changing LDAP configuration after installing CDE breaks authentication
If you change the LDAP configuration after installing CDE, as described in Configuring LDAP authentication for CDP Private Cloud, authentication no longer works.
Workaround: Re-install CDE after making any necessary changes to the LDAP configuration.
Gang scheduling is not supported
Gang scheduling is not currently supported for CDE on CDP Private Cloud.
HDFS is the default filesystem for all resource mounts
Workaround: For any jobs that use local filesystem paths as arguments to a Spark job, explicitly specify file:// as the scheme. For example, if your job uses a mounted resource called test-resource.txt, in the job definition, you would typically refer to it as /app/mount/test-resource.txt. In CDP Private Cloud, this should be specified as file:///app/mount/test-resource.txt.
The CDE virtual cluster quota is hard-coded to 100 CPUs and 10240 GB memory
Each CDE virtual cluster created is hard-coded to have a maximum of 100 CPU cores and 10240 GB memory.
Workaround: None.
Apache Ozone is supported only for log files
Apache Ozone is supported only for log files. It is not supported for job configurations, resources, and so on.
Scheduling jobs with URL references does not work
Scheduling a job that specifies a URL reference does not work.
Workaround: Use a file reference or create a resource and specify it

Limitations

Access key-based authentication will not be enabled in upgraded clusters prior to CDP PVC 1.3.4 release.
Workaround: After you upgrade to PVC 1.3.4 version from earlier versions, you must create the CDE Base service and Virtual Cluster again to use the new Access Key feature. Otherwise, the Access Key feature will not be supported in the CDE Base service created prior to the 1.3.4 upgrade.