Post Upgrade Steps for CDP PVC 1.4.1 in OCP Clusters

Ensure that the Control Plane has successfully upgraded to version 1.4.1.
Perform the steps below after completing the upgrade. All commands in these steps should be run on the command line of your local machine where the kube config for the cluster has been exported. You can export the kube config from the CLI by running:
export KUBECONFIG=<absolute path to kube config for the OCP cluster>
  1. Ensure that the cluster master nodes have all the correct labels and taints. Run the following command:
    kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints,LABELS:.metadata.labels
  2. Ensure that all the master nodes have the following label(s) (both key and value):
    node-role.kubernetes.io/master:

    Note that value for the label is empty

  3. Ensure that all the master nodes have the following taint(s) (both key and effect):
    map[effect:NoSchedule key:node-role.kubernetes.io/master]

    Note that value for the taint is empty

  4. If any of the labels and/or taints are wrong or completely missing, apply them using the following steps:
    1. Identify the names of all master nodes, for example master-01, master-02, master-03, and run the following commands:
      kubectl label nodes master-01 master-02 master-03 node-role.kubernetes.io/master= --overwrite=true
      kubectl taint nodes master-01 master-02 master-03 node-role.kubernetes.io/master=:NoSchedule --overwrite=true
      Alternatively, instead of supplying the names of every master node in the command, if all of your master nodes meet a certain filtering criteria, you can use it in the labeling and tainting commands. For example if all your master nodes have the label my-master-label-key=my-master-label-value then:
      kubectl label nodes --selector my-master-label-key=my-master-label-value
                        
      node-role.kubernetes.io/master= --overwrite=true
      kubectl taint nodes --selector my-master-label-key=my-master-label-value
                        node-role.kubernetes.io/master=:NoSchedule
                  --overwrite=true
  1. Ensure that you have the cdp-cli command line tool setup to the latest version available at the time (0.9.71+), with a CDP Private profile that has adequate privileges. Ensure that your profile has:
    • Correct form factor: private

    • Correct CDP endpoint URL: base URL to your CDP Private dashboard

    • Correct access key and private key: generated using CDP Private console

    • See cdpcli · PyPI for more detailed information

    • Identify the name of this CDP Private profile for later use

  2. Ensure that you have Python 3 installed and updated to a supported version.
  3. Create a file named post_upgrade_hook.py with the following contents:
    
    ###########################################################
    # This script upgrades YuniKorn for CDP PVT OCP clusters. #
    # Ensure that all prerequisites have been duly fulfilled. #
    # Please read the full documentation before use.          #
    ###########################################################
    
    import subprocess
    import json
    import argparse
    import sys
    import time
    
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--profile", default="", help="CDP Profile name as specified in ${HOME}/.cdp/credentials")
    parser.add_argument("-e", "--endpoint", default="", help="CDP Private base endpoint URL")
    args = parser.parse_args()
    
    cdpProfileName = args.profile
    controlPlanePublicEndpoint = args.endpoint
    
    print('**************************')
    print('**************************')
    if cdpProfileName != "":
        print("CDP Private profile:", cdpProfileName)
    if controlPlanePublicEndpoint != "":
        print("CDP Private base endpoint URL:", controlPlanePublicEndpoint)
    
    
    def get_command(cmd_list_suffix):
        cmd_list = ['cdp', '--no-verify-tls',
                    '--form-factor', 'private',
                    '--output', 'json']
        if cdpProfileName != "":
            cmd_list = cmd_list + ['--profile', cdpProfileName]
        if controlPlanePublicEndpoint != "":
            cmd_list = cmd_list + ['--endpoint-url', controlPlanePublicEndpoint]
        cmd_list = cmd_list + cmd_list_suffix
        return cmd_list
    
    
    envNames, envCrns = [], []
    
    process = subprocess.Popen(get_command(['environments',
                                            'list-environments']),
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE,
                               universal_newlines=True)
    stdout, stderr = process.communicate()
    try:
        data = json.loads(stdout)
    except ValueError:
        print('While list environments: Something is wrong with output, Output JSON:', stdout)
        print('____ERROR__WHILE__CALLING__LIST__ENVIRONMENTS__COMMAND____', stderr)
        sys.exit()
    for en in data['environments']:
        if en['status'] == 'AVAILABLE':
            envNames.append(en['environmentName'])
            envCrns.append(en['crn'])
    
    print('**************************')
    print('**************************')
    print('Environment names:', envNames)
    print('Environment CRNs:', envCrns)
    
    clusterIds, clusterCrns = [], []
    
    for crn in envCrns:
        process = subprocess.Popen(get_command(['compute',
                                                'list-clusters',
                                                '--env-name-or-crn', crn]),
                                   stdout=subprocess.PIPE,
                                   stderr=subprocess.PIPE,
                                   universal_newlines=True)
        stdout, stderr = process.communicate()
        try:
            data = json.loads(stdout)
        except ValueError:
            print('While list clusters: Something is wrong with output for environment:', crn, ', Output JSON:', stdout)
            print('____ERROR__WHILE__CALLING__LIST__CLUSTER__COMMAND____', stderr)
            continue
        for en in data['clusters']:
            if en['status'] == 'REGISTERED':
                clusterIds.append(en['clusterId'])
                clusterCrns.append(en['clusterCrn'])
    
    print('**************************')
    print('**************************')
    print('Cluster IDs:', clusterIds)
    print('Cluster CRNs:', clusterCrns)
    
    upgradeErrs = {}
    
    for crn in clusterCrns:
        print('**************************')
        print('**************************')
        tryErr = ''
        for i in range(0, 10):
            time.sleep(60)
            print('Cluster:', crn, 'Try:', i)
            process = subprocess.Popen(get_command(['compute',
                                                    'upgrade-deployment',
                                                    '--cluster-crn', crn,
                                                    '--namespace', 'yunikorn',
                                                    '--name', 'yunikorn']),
                                       stdout=subprocess.PIPE,
                                       stderr=subprocess.PIPE,
                                       universal_newlines=True)
            stdout, stderr = process.communicate()
            try:
                data = json.loads(stdout)
            except ValueError:
                print('While upgrade deployment: Something is wrong with output for cluster:', crn, ', Output JSON:', stdout)
                print('____ERROR__WHILE__CALLING__UPGRADE__DEPLOYMENT__COMMAND____', stderr)
                if i == 9:
                    print('Failed upgrade deployment due to JSON error for cluster:', crn, ', Tries exhausted')
                    tryErr = 'Error'
                    break
                continue
            break
        if tryErr == 'Error':
            upgradeErrs[crn] = 'Error'
        else:
            print('Response status for cluster:', crn, 'from upgrade deployment command:', data['status'])
            upgradeErrs[crn] = ''
    
    for crn in clusterCrns:
        print('**************************')
        print('**************************')
        if upgradeErrs[crn] == 'Error':
            print('Skipping failed cluster:', crn)
            continue
        pollErr = ''
        for i in range(0, 100):
            time.sleep(5)
            print('Cluster:', crn, 'Try:', i)
            process = subprocess.Popen(get_command(['compute',
                                                    'describe-deployment',
                                                    '--cluster-crn', crn,
                                                    '--namespace', 'yunikorn',
                                                    '--name', 'yunikorn']),
                                       stdout=subprocess.PIPE,
                                       stderr=subprocess.PIPE,
                                       universal_newlines=True)
            stdout, stderr = process.communicate()
            try:
                data = json.loads(stdout)
            except ValueError:
                print('While describe deployment: Something is wrong with output for cluster:', crn, ', Output JSON:',
                      stdout)
                print('____ERROR__WHILE__CALLING__DESCRIBE__DEPLOYMENT__COMMAND____', stderr)
                if i == 99:
                    print('Failed describe deployment due to JSON error for cluster:', crn, ', Tries exhausted')
                    pollErr = 'Error'
                    break
                continue
            if data['deployment']['status'] == 'DEPLOYED':
                break
            print('Response status for cluster:', crn, 'from describe deployment command:', data['deployment']['status'])
            if i == 99:
                print('Failed deployment upgrade due to timeout for cluster:', crn, ', Tries exhausted')
                pollErr = 'Error'
        if pollErr == 'Error':
            print('Upgrade deployment failed for cluster:', crn)
        else:
            print('Upgrade deployment completed for cluster:', crn)
    
  4. Run the script as follows:
     python3 post_upgrade_hook.py --profile <your-CDP-Private-profile> --endpoint <your-CDP-Private-base-endpoint>