Post Upgrade Steps for CDP PVC 1.4.1 in OCP Clusters
export KUBECONFIG=<absolute path to kube config for the OCP cluster>
- Ensure that the cluster master nodes have all the correct labels
and taints. Run the following command:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints,LABELS:.metadata.labels
- Ensure that all the master nodes have the following label(s)
(both key and value):
node-role.kubernetes.io/master:
Note that value for the label is empty
- Ensure that all the master nodes have the following taint(s)
(both key and effect):
map[effect:NoSchedule key:node-role.kubernetes.io/master]
Note that value for the taint is empty
- If any of the labels and/or taints are wrong or completely missing, apply
them using the following steps:
- Identify the names of all master nodes, for example
master-01, master-02, master-03, and run
the following commands:
kubectl label nodes master-01 master-02 master-03 node-role.kubernetes.io/master= --overwrite=true
kubectl taint nodes master-01 master-02 master-03 node-role.kubernetes.io/master=:NoSchedule --overwrite=true
Alternatively, instead of supplying the names of every master node in the command, if all of your master nodes meet a certain filtering criteria, you can use it in the labeling and tainting commands. For example if all your master nodes have the labelmy-master-label-key=my-master-label-value
then:kubectl label nodes --selector my-master-label-key=my-master-label-value
node-role.kubernetes.io/master= --overwrite=true
kubectl taint nodes --selector my-master-label-key=my-master-label-value node-role.kubernetes.io/master=:NoSchedule --overwrite=true
- Identify the names of all master nodes, for example
master-01, master-02, master-03, and run
the following commands:
- Ensure that you have the cdp-cli command line tool setup
to the latest version available at the time (0.9.71+), with a CDP
Private profile that has adequate privileges. Ensure that your profile
has:
-
Correct form factor:
private
-
Correct CDP endpoint URL: base URL to your CDP Private dashboard
-
Correct access key and private key: generated using CDP Private console
-
See cdpcli · PyPI for more detailed information
-
Identify the name of this CDP Private profile for later use
-
- Ensure that you have Python 3 installed and updated to a supported version.
- Create a file named
post_upgrade_hook.py
with the following contents:########################################################### # This script upgrades YuniKorn for CDP PVT OCP clusters. # # Ensure that all prerequisites have been duly fulfilled. # # Please read the full documentation before use. # ########################################################### import subprocess import json import argparse import sys import time parser = argparse.ArgumentParser() parser.add_argument("-p", "--profile", default="", help="CDP Profile name as specified in ${HOME}/.cdp/credentials") parser.add_argument("-e", "--endpoint", default="", help="CDP Private base endpoint URL") args = parser.parse_args() cdpProfileName = args.profile controlPlanePublicEndpoint = args.endpoint print('**************************') print('**************************') if cdpProfileName != "": print("CDP Private profile:", cdpProfileName) if controlPlanePublicEndpoint != "": print("CDP Private base endpoint URL:", controlPlanePublicEndpoint) def get_command(cmd_list_suffix): cmd_list = ['cdp', '--no-verify-tls', '--form-factor', 'private', '--output', 'json'] if cdpProfileName != "": cmd_list = cmd_list + ['--profile', cdpProfileName] if controlPlanePublicEndpoint != "": cmd_list = cmd_list + ['--endpoint-url', controlPlanePublicEndpoint] cmd_list = cmd_list + cmd_list_suffix return cmd_list envNames, envCrns = [], [] process = subprocess.Popen(get_command(['environments', 'list-environments']), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True) stdout, stderr = process.communicate() try: data = json.loads(stdout) except ValueError: print('While list environments: Something is wrong with output, Output JSON:', stdout) print('____ERROR__WHILE__CALLING__LIST__ENVIRONMENTS__COMMAND____', stderr) sys.exit() for en in data['environments']: if en['status'] == 'AVAILABLE': envNames.append(en['environmentName']) envCrns.append(en['crn']) print('**************************') print('**************************') print('Environment names:', envNames) print('Environment CRNs:', envCrns) clusterIds, clusterCrns = [], [] for crn in envCrns: process = subprocess.Popen(get_command(['compute', 'list-clusters', '--env-name-or-crn', crn]), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True) stdout, stderr = process.communicate() try: data = json.loads(stdout) except ValueError: print('While list clusters: Something is wrong with output for environment:', crn, ', Output JSON:', stdout) print('____ERROR__WHILE__CALLING__LIST__CLUSTER__COMMAND____', stderr) continue for en in data['clusters']: if en['status'] == 'REGISTERED': clusterIds.append(en['clusterId']) clusterCrns.append(en['clusterCrn']) print('**************************') print('**************************') print('Cluster IDs:', clusterIds) print('Cluster CRNs:', clusterCrns) upgradeErrs = {} for crn in clusterCrns: print('**************************') print('**************************') tryErr = '' for i in range(0, 10): time.sleep(60) print('Cluster:', crn, 'Try:', i) process = subprocess.Popen(get_command(['compute', 'upgrade-deployment', '--cluster-crn', crn, '--namespace', 'yunikorn', '--name', 'yunikorn']), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True) stdout, stderr = process.communicate() try: data = json.loads(stdout) except ValueError: print('While upgrade deployment: Something is wrong with output for cluster:', crn, ', Output JSON:', stdout) print('____ERROR__WHILE__CALLING__UPGRADE__DEPLOYMENT__COMMAND____', stderr) if i == 9: print('Failed upgrade deployment due to JSON error for cluster:', crn, ', Tries exhausted') tryErr = 'Error' break continue break if tryErr == 'Error': upgradeErrs[crn] = 'Error' else: print('Response status for cluster:', crn, 'from upgrade deployment command:', data['status']) upgradeErrs[crn] = '' for crn in clusterCrns: print('**************************') print('**************************') if upgradeErrs[crn] == 'Error': print('Skipping failed cluster:', crn) continue pollErr = '' for i in range(0, 100): time.sleep(5) print('Cluster:', crn, 'Try:', i) process = subprocess.Popen(get_command(['compute', 'describe-deployment', '--cluster-crn', crn, '--namespace', 'yunikorn', '--name', 'yunikorn']), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True) stdout, stderr = process.communicate() try: data = json.loads(stdout) except ValueError: print('While describe deployment: Something is wrong with output for cluster:', crn, ', Output JSON:', stdout) print('____ERROR__WHILE__CALLING__DESCRIBE__DEPLOYMENT__COMMAND____', stderr) if i == 99: print('Failed describe deployment due to JSON error for cluster:', crn, ', Tries exhausted') pollErr = 'Error' break continue if data['deployment']['status'] == 'DEPLOYED': break print('Response status for cluster:', crn, 'from describe deployment command:', data['deployment']['status']) if i == 99: print('Failed deployment upgrade due to timeout for cluster:', crn, ', Tries exhausted') pollErr = 'Error' if pollErr == 'Error': print('Upgrade deployment failed for cluster:', crn) else: print('Upgrade deployment completed for cluster:', crn)
- Run the script as follows:
python3 post_upgrade_hook.py --profile <your-CDP-Private-profile> --endpoint <your-CDP-Private-base-endpoint>