Known issues for Cloudera Data Services on premises 1.5.5 SP2

Review the known issues for Cloudera Data Services on premises, the impact or changes to the functionality, and the workaround.

For more information on 1.5.5 SP1 known issues, see Known Issues in 1.5.5 SP1.

Known issues identified in 1.5.5 SP2 release

OPSX-6950: DRS Restore fails due to ClusterIP allocation conflict: During a DRS restore, the restore operation can fail with a Kubernetes error indicating that a service ClusterIP is already allocated. This occurs when the restore process attempts to recreate a service using a ClusterIP that is currently in use by another existing service in the cluster.
The following is a typical error message:
service "cdp-release-cert-manager-cainjector" is invalid: spec.clusterIPs: failed to allocate IP <IP_ADDRESS>: provided IP is already allocated; To resolve this issue, perform the following actions:

Identify the service using the conflicting IP address by using the following command:
kubectl get svc -A -o wide | grep <IP_ADDRESS>

Retry the DRS restore operation. On retrying, the process will clean up existing conflicting resources including the service holding the conflicting IP before proceeding with the restore operation. If you cannot access the Cloudera Control Plane UI to retry the restore operation, please contact Cloudera Support for assistance.

OPSX-6867: Post-upgrade validation fails due to longhorn-system pods in CrashLoopBackOff: During an upgrade to 1.5.5 SP2, some Longhorn CSI plugin pods remain in terminating state causing the upgrade to fail. This issue occurs when Longhorn is not configured to use dedicated disks, leading to instability in the storage components and preventing proper pod shutdown and restart.; To address this issue:

For ECS customers, Longhorn must be deployed only on dedicated disks. Shared disks can cause pods to shutdown during the upgrade.

Delete the Longhorn CSI plugin pods that are in the terminating state. Removing these pods allows the system to clear the stuck resources and unblocks the Longhorn components.

Retry the upgrade. On retry, the process will proceed successfully once the conflicting pods are removed.

OPSX-6858: Cloudera Embedded Container Service first run fails at install-cp step due to mke2fs failure

During some Cloudera Embedded Container Service installations, the first-run process fails at the install-cp step because certain pods remain in Creating state. The underlying cause is a failure to mount the associated Longhorn volume, which leads to an error when Kubernetes tries to format the block device. The pod event shows the following message:

Warning FailedMount ... MountVolume.MountDevice failed for volume "pvc-…"
rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-…" failed:
… mke2fs … /dev/longhorn/pvc-… is apparently in use by the system; will not make a filesystem here!

This happens when the Longhorn PVC block device still contains stale filesystem or partition metadata from previous use. Because the device appears in use the mke2fs command cannot create a new filesystem, blocking the pod from starting.

To resolve this issue, you can manually clear the remaining metadata on the Longhorn block device on the node where the pod is running. For example, run the wipefs command on the affected device:

wipefs /dev/longhorn/pvc-2e2dc23b-82d6-45cd-9348-b40eba0fb4e1

This removes existing filesystem signatures so that Longhorn (and Kubernetes) can format and mount the volume successfully. After cleaning up the metadata, retry the Cloudera Embedded Container Service installation. The pods are expected to proceed past the install-cp step.

OPSX-5239: Updating the External Docker Registry Certificate command fails when existing Pods are restarted.: If a wrong certificate is updated using the path ECS-> admin-> certificates then the wrong certificate cannot be restored using the Cloudera Manager Update External Docker Certificate command to correct the external docker certificate.; If you plan to alter the external docker certificate with an invalid certificate and run the Cloudera Manager's 'Update External Docker Certificate' command to correct the external docker certificate, this workflow is not supported.; For Example:; Install PVC with an external docker registry.

2. Update the wrong certificate in the ECS configurations and run the Update External Docker Registry Certificate command.

Restart all the Pods in the cdp namespace. (Pods are in imagepull backoff error state).

Update the correct certificate in ECS configurations and run the Update External Docker Registry Certificate command.; Running the 4th step in Cloudera Manager does not support restoring the wrong certificate.; None

OPSX - 7123: Unable to stop or restart the ECS service: The Kubelet service is not stopped when stopping the ECS role using the Cloudera Manager instance.; Perform the following steps:

Verify all the ECS roles that are not in the Stopped state once the stop command fails.

Select all the ECS roles that are not stopped and click the Stop action.

CDPUI - 2643: User option to switch between the Legacy and new landing pages is not available: If you have upgraded to Cloudera Data Services on premises 1.5.5 Service Pack 2 release and accessed the new landing page, you cannot switch back to the Legacy UI page. The toggle option is not available to perform this action.; None

OPSX - 6984: Due to the integration issue between the cert-manager and Venafi TPP, the certificate creation fails with an error message: Certificate issuance fails with a key modulus mismatch between the certificate request and the existing certificate in Venafi TPP. An error message vcert error: your data contains problems: request doesn't match certificate: unmatched key modulus appears.; You must review and apply the recommended fix by using the community resource. Later, you must restart Venafi TPP and IIS.