Release NotesPDF version

Known Issues in Cloudera Data Services on premises 1.5.4 SP2

The following are the known issues in the 1.5.4 SP2 release of Cloudera Data Services on premises.

OPSX-5776 and OPSX-5747 - ECS - Some of the rke2-canal DaemonSet pods in the kube-system namespace are stuck in Init state causing longhorn volume attach issues
In a few cases, after upgrading from Cloudera Data Services on premises 1.5.3 or 1.5.3-CHF1 to 1.5.4 SP2, a pod that belongs to rke2-canal DaemonSet is stuck in Init status. This causes some pods in kube-system and longhorn-system namespaces to be in Init or CrashLoopBackOff status. This manifests as volume attach failure in the embedded-db-0 pod in CDP namespace, and causes some pods in CDP namespace to be in CrashLoopBackOff state.
  1. Perform a rolling restart of rke2-canal DaemonSet by running the following command:
    kubectl rollout restart ds rke2-canal -n kube-system
  2. Monitor the DaemonSet restart status by running the following command:
    kubectl get ds -n kube-system
  3. After the rke2-canal DaemonSet restart is complete, if any pods in DaemonSets within the longhorn-system namespace remain in Init or CrashLoopBackOff state, perform a rolling restart of those DaemonSets. Choose the appropriate command based on the specific DaemonSet that is failing. If more than one DaemonSet requires a restart, restart them sequentially, one at a time.
    kubectl rollout restart ds longhorn-csi-plugin -n longhorn-system
    kubectl rollout restart ds longhorn-manager -n longhorn-system
    kubectl rollout restart ds engine-image-ei-6b4330bf -n longhorn-system
    kubectl rollout restart ds engine-image-ei-ea8e2e58 -n longhorn-system 
  4. Monitor the DaemonSet restart status by running the following command:
    kubectl get ds -n longhorn-system
OPSX-5903 - 1.5.3 to 1.5.4 SP2 ECS - The upgrade fails when the rke2-ingress-nginx-controller system exceeds its progress deadline.
Upgrade fails while running the following command:
kubectl rollout status deployment/rke2-ingress-nginx-controller -n kube-system --timeout=5m
Run the following command:
kubectl rollout status deployment/rke2-ingress-nginx-controller -n kube-system --timeout=5m
Run the command for Refresh ECS by performing the following steps:
  1. Find the number of replicas for the deployment:
    kubectl get deployment -n kube-system rke2-ingress-nginx-controller
  2. Scale it down to 0:
    kubectl scale deployment rke2-ingress-nginx-controller -n kube-system --replicas=0
  3. Once all associated pods are terminated scale it back:
    kubectl scale deployment rke2-ingress-nginx-controller -n kube-system --replicas=<n>
  4. Resume upgrade.
OPSAPS-72270 - Start ECS command fails on uncordon nodes step

In an ECS HA cluster sometimes, the server node restarts during start up. This causes the uncordon step to fail.

Run the following command on the same node to verify whether the kube-apiserver is ready:
kubectl get pods -n kube-system | grep kube-apiserver

Resume the command from the Cloudera Manager UI.

OPSAPS-72964 and OPSAPS-72769: Unseal Vault command fails after restarting the ECS service
Unseal Vault command fails sometimes, after restarting the ECS service.

It may take sometime for the ECS cluster to be up and running after a restart operation. In case Unseal Vault fails after the restart operation please follow the below steps:

  1. Verify that the pod vault-0 in the vault-system namespace is running.
  2. Once it is in Running state, initiate the Unseal Vault command from the ECS service's Action menu.
OPSX-5986: ECS fresh install failing with helm-install-rke2-ingress-nginx pod failing to come into Completed state
ECS fresh install fails at the "Execute command Reapply All Settings to Cluster on service ECS" step due to a timeout waiting for helm-install. To confirm the issue, run the following kubectl command on the ECS server host to check if the pod is stuck in a running state:
kubectl get pods -n kube-system | grep helm-install-rke2-ingress-nginx
To resolve the issue, manually delete the pod by running the following command:
kubectl delete pod <helm-install-rke2-ingress-nginx-pod-name> -n kube-system
Then, click Resume to proceed with the fresh install process on the Cloudera Manager UI.