Known issues for Cloudera Data Services on premises 1.5.5 SP2

Learn about the known issues for Cloudera Data Services on premises, the impact or changes to the functionality, and the workaround.

For more information on 1.5.5 SP1 known issues, see Known Issues in 1.5.5 SP1.

Known issues identified in 1.5.5 SP2

OPSX-6950 - DRS Restore fails due to ClusterIP allocation conflict
During a DRS restore, the restore operation can fail with a Kubernetes error indicating that a service ClusterIP is already allocated. This occurs when the restore process attempts to recreate a service using a ClusterIP that is currently in use by another existing service in the cluster.

The following is a typical error message:

service "cdp-release-cert-manager-cainjector" is invalid:
spec.clusterIPs: failed to allocate IP <IP_ADDRESS>: provided IP is already allocated
To resolve this issue, perform the following actions:
  1. Identify the service using the conflicting IP address by using the following command:
    kubectl get svc -A -o wide | grep <IP_ADDRESS>
  2. Retry the DRS restore operation. On retrying, the process will clean up existing conflicting resources including the service holding the conflicting IP before proceeding with the restore operation. If you cannot access the Cloudera Control Plane UI to retry the restore operation, please contact Cloudera Support for assistance.
OPSX-6867 - Post-upgrade validation fails due to longhorn-system pods in CrashLoopBackOff
During an upgrade to 1.5.5 SP2, some Longhorn CSI plugin pods remain in terminating state causing the upgrade to fail. This issue occurs when Longhorn is not configured to use dedicated disks, leading to instability in the storage components and preventing proper pod shutdown and restart.
To address this issue:
  • For ECS customers, Longhorn must be deployed only on dedicated disks. Shared disks can cause pods to shutdown during the upgrade.
  • Delete the Longhorn CSI plugin pods that are in the terminating state. Removing these pods allows the system to clear the stuck resources and unblocks the Longhorn components.
  • Retry the upgrade. On retry, the process will proceed successfully once the conflicting pods are removed.
OPSX-6858 - Cloudera Embedded Container Service first run fails at install-cp step due to mke2fs failure
During some Cloudera Embedded Container Service installations, the first-run process fails at the install-cp step because certain pods remain in Creating state. The underlying cause is a failure to mount the associated Longhorn volume, which leads to an error when Kubernetes tries to format the block device. The pod event shows the following message:
Warning FailedMount ... MountVolume.MountDevice failed for volume "pvc-…"
rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-…" failed:
… mke2fs … /dev/longhorn/pvc-… is apparently in use by the system; will not make a filesystem here!
This happens when the Longhorn PVC block device still contains stale filesystem or partition metadata from previous use. Because the device appears in use the mke2fs command cannot create a new filesystem, blocking the pod from starting.
To resolve this issue, you can manually clear the remaining metadata on the Longhorn block device on the node where the pod is running. For example, run the wipefs command on the affected device:
wipefs /dev/longhorn/pvc-2e2dc23b-82d6-45cd-9348-b40eba0fb4e1
This removes existing filesystem signatures so that Longhorn (and Kubernetes) can format and mount the volume successfully. After cleaning up the metadata, retry the Cloudera Embedded Container Service installation. The pods are expected to proceed past the install-cp step.
OPSX - 6984 - Due to the integration issue between the cert-manager and Venafi TPP, the certificate creation fails with an error message
Certificate issuance fails with a key modulus mismatch between the certificate request and the existing certificate in Venafi TPP. An error message vcert error: your data contains problems: request doesn't match certificate: unmatched key modulus appears.
You must review and apply the recommended fix by using the community resource. Later, you must restart Venafi TPP and IIS.