Testing Longhorn health post Cloudera Embedded Container Service upgrade
Post Cloudera Embedded Container Service upgrade Longhorn health test fails and the helm-install-longhorn pod gets in crashloop state.
To fix this issue, run the following command:
#Get the history of longhorn helm chart so that we can identify the chart for which installation is failing. #
helm history longhorn -n longhorn-system
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Wed Feb 28 05:32:47 2024 deployed longhorn-1.4.2 v1.4.2 Install complete
2 Wed Feb 28 09:28:39 2024 uninstalling longhorn-1.5.4 v1.5.4 Deletion in progress (or silently failed)
#The actual chart is saved as kubernetes secret. List the longhorn helm chart saved as secrets.#
kubectl get secrets -n longhorn-system
NAME TYPE DATA AGE
basic-auth Opaque 1 15h
chart-values-longhorn Opaque 0 10h
longhorn-webhook-ca kubernetes.io/tls 2 15h
longhorn-webhook-tls kubernetes.io/tls 2 15h
sh.helm.release.v1.longhorn.v1 helm.sh/release.v1 1 15h
sh.helm.release.v1.longhorn.v2 helm.sh/release.v1 1 21m
#We want to delete the latest chart i.e. sh.helm.release.v1.longhorn.v2. Save the back up of the secret as yaml before deleting. #
kubectl get secrets sh.helm.release.v1.longhorn.v2 -n longhorn-system -o yaml > sh.helm.release.v1.longhorn.v2.yaml
#Save the back up of the default values passed along with the helm chart while installing.#
helm get values --revision=2 longhorn -n longhorn-system > defaultSettings.yaml
#Find all jobs in longhorn-system and delete those. These jobs will be re-triggered as part of the manual patch.#
kubectl get jobs -n longhorn-system
NAME COMPLETIONS DURATION AGE
helm-install-longhorn 0/1 9h 9h
longhorn-post-upgrade 1/1 11m 10h
longhorn-uninstall 0/1 10h 10h
#Delete the latest longhorn chart#
kubectl delete job helm-install-longhorn longhorn-uninstall longhorn-post-upgrade -n longhorn-system
kubectl delete secret sh.helm.release.v1.longhorn.v2 -n longhorn-system
#Apply the longhorn chart from the parcel directory.#
kubectl patch HelmChart longhorn -n longhorn-system --type=merge --patch-file /opt/cloudera/parcels/ECS/longhorn/longhorn.yaml
