Testing Longhorn health post ECS upgrade
Post ECS upgrade Longhorn health test fails and the helm-install-longhorn pod gets in crashloop state.
To fix this issue, run the following command:
#Get the history of longhorn helm chart so that we can identify the chart for which installation is failing. # helm history longhorn -n longhorn-system REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 1 Wed Feb 28 05:32:47 2024 deployed longhorn-1.4.2 v1.4.2 Install complete 2 Wed Feb 28 09:28:39 2024 uninstalling longhorn-1.5.4 v1.5.4 Deletion in progress (or silently failed) #The actual chart is saved as kubernetes secret. List the longhorn helm chart saved as secrets.# kubectl get secrets -n longhorn-system NAME TYPE DATA AGE basic-auth Opaque 1 15h chart-values-longhorn Opaque 0 10h longhorn-webhook-ca kubernetes.io/tls 2 15h longhorn-webhook-tls kubernetes.io/tls 2 15h sh.helm.release.v1.longhorn.v1 helm.sh/release.v1 1 15h sh.helm.release.v1.longhorn.v2 helm.sh/release.v1 1 21m #We want to delete the latest chart i.e. sh.helm.release.v1.longhorn.v2. Save the back up of the secret as yaml before deleting. # kubectl get secrets sh.helm.release.v1.longhorn.v2 -n longhorn-system -o yaml > sh.helm.release.v1.longhorn.v2.yaml #Save the back up of the default values passed along with the helm chart while installing.# helm get values --revision=2 longhorn -n longhorn-system > defaultSettings.yaml #Find all jobs in longhorn-system and delete those. These jobs will be re-triggered as part of the manual patch.# kubectl get jobs -n longhorn-system NAME COMPLETIONS DURATION AGE helm-install-longhorn 0/1 9h 9h longhorn-post-upgrade 1/1 11m 10h longhorn-uninstall 0/1 10h 10h #Delete the latest longhorn chart# kubectl delete job helm-install-longhorn longhorn-uninstall longhorn-post-upgrade -n longhorn-system kubectl delete secret sh.helm.release.v1.longhorn.v2 -n longhorn-system #Apply the longhorn chart from the parcel directory.# kubectl patch HelmChart longhorn -n longhorn-system --type=merge --patch-file /opt/cloudera/parcels/ECS/longhorn/longhorn.yaml