Fixed issues in 1.5.5 SP3

Fixed issues for Cloudera Data Engineering on premises 1.5.5 Service Pack 3 resolve bugs and functional discrepancies previously logged through Cloudera Support or identified as Known Issues by customers and Cloudera Quality Engineering teams.

DEX-20436: Job run fails with start TGT generation failure if stop TGT generation is being processed: Previously, in Cloudera Data Engineering job runs, if a stop TGT generation request from one job and a start TGT generation request from another job for the same user reached the TGT service simultaneously, the second job run could fail with an AlreadyExists error. This issue is now fixed.
DEX-19498: Users or machine users with roles assigned through group memberships lose access temporarily: Previously, for Embedded Container Service, users who get roles such as VC User, VC Admin, Service User, Service Admin, and DEAdmin through group membership might lose access temporarily. This issue is now fixed.
DEX-17361: Restore Cloudera Data Engineering job fails if the job is created with the schedule having start time or end time in format other than the pre-defined time format: Previously, scheduled jobs created using the Jobs API did not restore after backup if their start and end times were not in RFC3339Milli or RFC3339Nano formats respectively. This issue is now fixed.
DEX-20335: Erroneous LDAPGroupsMapping lookups in Cloudera Data Engineering Ozone-based Spark or Hive jobs: Previously, some Cloudera Data Engineering Ozone-based Spark or Hive external table jobs might emit repeated LDAP warnings and incur unnecessary LDAP authentication attempts when the base cluster was configured with LDAPGroupsMapping configurations in Cloudera Manager. Due to a Cloudera Manager API/client configuration change in 1.5.5, the LDAP bind user password was not propagated to the Cloudera Data Engineering client configurations, so the job’s LDAP group lookup was attempted with a null password and failed. This led to excessive failed LDAP binds (visible in Job logs as the WARN LdapGroupsMapping... javax.naming.NamingException... successful bind must be completed message) and, in some environments, might overwhelm the LDAP or AD server or even trigger bind account lockouts due to security policies. This issue is now fixed.
DEX-7459: The numExecutors parameter is missing in job configuration for new jobs: Previously, for jobs running with a fixed number of executors by setting spark.dynamicAllocation.enabled to false, Cloudera Data Engineering UI failed to preserve the numExecutors parameter for new or cloned jobs. This issue is now fixed.
DEX-15232: Cloudera Data Engineering configuration profiles failed to load on Windows while using CDE CLI: Previously, the CDE CLI failed to correctly identify the user's home directory (example, %USERPROFILE%) on Windows systems. This issue caused the CLI to ignore configuration profiles defined in the config.yaml file.
This issue is now fixed. The CDE CLI now properly evaluates the Windows home directory, ensuring that all defined configuration profiles are successfully processed. Additionally, profile discovery is now correctly enabled when the CDE_CONFIG environment variable is set.
DEX-5444: Cloudera Data Engineering on premises is not able to distinguish between stdout and stderr when forwarding logs: Previously, all Spark job driver and executor output from both stderr and stdout was incorrectly redirected to the stderr log file. This issue is now resolved. Cloudera Data Engineering Spark logging now correctly directs output to the appropriate stdout and stderr streams, ensuring accurate log visibility in the Cloudera Data Engineering interface.
DEX-20650: Spark job pod startup timeouts caused by recursive volume permission checks: Previously, pods accessing shared Persistent Volume Claims (PVCs) occasionally experienced inconsistent fsGroup settings. This inconsistency resulted in extended initialization times during volume mounting, which subsequently delayed the execution of Spark jobs.
This issue is now resolved by ensuring consistent fsGroup configuration across all pods accessing shared volumes. This change eliminated the resulting startup delays and improved job scheduling reliability.
DEX-21166: Cloudera Data Engineering CLI for Windows panics at startup on non-FIPS hosts: Previously, the Cloudera Data Engineering) CLI binary (cde.exe) caused a system panic at startup when run on Windows hosts that did not have FIPS mode enabled at the operating system level.
This issue is now resolved, ensuring that the Cloudera Data Engineering CLI now initializes correctly on all Windows hosts, regardless of whether FIPS mode is enabled.
DEX-20562: UI timeouts due to unoptimized backend calls: Previously, unoptimized backend calls could cause UI timeouts. This issue is now resolved by optimizing the Virtual Cluster details API in Cloudera Data Engineering on premises and removing redundant authorization calls, resulting in faster response times.
DEX-20992: The spark.addArtifact API fails to upload large JAR files in Spark Connect sessions: Previously, attempts to add large JAR files (typically exceeding 100 MB) to a Spark Connect session using the spark.addArtifact API would fail with a GRPC error. This was due to a default configuration limit in the NGINX proxy body size. This issue is now fixed and the proxy configuration is updated to support artifacts up to 256 MB. You can now successfully upload and use large dependencies in your Spark Connect-enabled applications.
DEX-20776: Unable to unpause or manually run an Airflow DAG without a schedule: Previously, Cloudera Data Engineering prevented users from unpausing or manually triggering Airflow DAGs that did not have a defined schedule, for example, schedule=None. This was due to an over-robust validation check that incorrectly flagged these DAGs as having no valid schedule. This issue is now resolved by allowing pause and unpause operations for all valid Airflow DAGs, regardless of their schedule interval, allowing manual execution to function as expected.