Procedure to Downgrade from CDP 7.3.1

Perform the below procedure to downgrade your cluster from 7.3.1 to 7.1.9 SP1, 7.1.9, CDP 7.1.8 cumulative hotfix 17, or 7.1.7 SP3.

Downgrade

Downgrade restores the software to the prior release version but data remains intact. Service interruptions are still expected, but the use of Rolling Restart minimizes it. After HDFS and/or Ozone is finalized, it is not possible to Downgrade or Rollback.

The following components do not support downgrade. If these components are in use, rollback may be necessary. Work with Cloudera Support to devise a plan based on the deployed components.
  • Ranger
  • Ranger RMS
  • Ranger KMS
  • KTS
  • YARN Queue Manager
  • Schema Registry
  • Solr
  • Atlas

Pre-downgrade steps

Ozone

This procedure is applicable only if you are downgrading from CDP 7.3.1 to CDP 7.1.8.

  1. Stop the Ozone Recon Web UI. Within Cloudera Manager UI, navigate to the Ozone service > Ozone Recon > Actions > Stop this Ozone Recon.
  2. Navigate to Configuration within the Ozone service and collect the value of ozone.recon.db.dir (default value is /var/lib/hadoop-ozone/recon/data).
  3. SSH to the Ozone Recon Web UI host and move the ozone.recon.db.dir parent directory to a backup location: mv /var/lib/hadoop-ozone/recon /var/lib/hadoop-ozone/recon-backup-CDP.
HBase

Stop the HBase Master(s). Execute knit as the hbase user if kerberos is enabled.

  1. Stop Omid within Cloudera Manager UI
  2. Navigate to the HBase service > Instances within Cloudera Manager UI and note the hostname of the HBase Master instance(s). Login to the host(s) and execute the following: hbase master stop --shutDownCluster
  3. Stop the remaining HBase components. Navigate to the HBase service within Cloudera Manager UI > Actions > Stop

The following must be performed when downgrading to 7.1.7 SP2 from 7.3.1. You will need to kinit as the hbase user if kerberos is enabled.

  1. Contact support for the appropriate hbck2 jar
  2. Execute a dry run of the shortenTableinfo command and validate the appropriate files have been identified hbase --config /etc/hbase/conf hbck -j hbase-hbck2-X.Y.Z.jar shortenTableinfo
  3. Run the shortenTableinfo -fix command to fix the file format hbase --config /etc/hbase/conf hbck -j hbase-hbck2-X.Y.Z.jar shortenTableinfo -fix
Cruise Control

The following steps are applicable only if you downgrade from 7.3.1 to 7.1.7 SP2 and not from 7.3.1 to 7.1.8. You can skip this section if the Cruise Control Goal configurations were set to the default values before performing the upgrade. However, if the Cruise Control Goal configuration values were changed before performing the upgrade, then you must proceed with this section.

In Cruise Control, you must rename com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal in Cloudera Manager > Clusters > Cruise Control > Configurations tab in every occurrences as described below during downgrade process.

In Cruise Control, from 7.1.8 and higher, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal is renamed to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal.

Perform the below steps:

  1. Check the following goal sets if RackAwareDistributionGoal is present (Cloudera Manager > Clusters > Cruise Control > Configurations tab):
    1. default.goals
    2. goals
    3. self.healing.goals
    4. hard.goals
    5. anomaly.detection.goals
  2. Create a note for yourself about where RackAwareDistributionGoal were present
  3. Remove RackAwareDistributionGoal from all of the goal lists
  4. Perform the runtime downgrade process

Downgrading the Runtime parcel

  1. Navigate to Parcels within Cloudera Manager.
  2. Locate the CDP Private Cloud Base 7.1.7 SP3/7.1.8/7.1.9 SP1 parcel and click Activate.
  3. Follow the wizard and address any issues from the various inspectors.
  4. When the parcel is activated, click the Actions menu next to the cluster name and select Post Cloudera Runtime Upgrade.

The upgrade activates the parcel and restarts services.

Restart Services with Stale Configuration

  1. Navigate to Clusters and find the cluster being downgraded.
  2. Inspect the list of services for the stale configuration indicator (yellow power button), if found, click on the indicator.
  3. Follow the wizard to restart services with stale configuration.

Restore CDH Databases

Restore the following databases from the CDH backups. Follow the order below and restore only a single service at a time. Stop the service prior to restoring to the database. Start the service after restoring the database.
  • Ranger
  • Ranger KMS
  • Stream Messaging Manager
  • Schema Registry
  • Hue (only if you are downgrading to 7.1.7 SP2)

The steps for backing up and restoring databases differ depending on the database vendor and version you select for your cluster and are beyond the scope of this document.

Rollback Cloudera Navigator Encryption Components

If you are rolling back any encryption components (Key Trustee KMS, HSM KMS, Key HSM, or Navigator Encrypt), first refer to:

Start the Key Management Server

Restart the Key Management Server. Open the Cloudera Manager Admin Console, go to the KMS service page, and select Actions > Start.

Rollback Key HSM

To roll back Key HSM:
  1. Install the version of Navigator Key HSM to which you wish to roll back
    Install the Navigator Key HSM package using yum:
    sudo yum downgrade keytrustee-keyhsm

    Cloudera Navigator Key HSM is installed to the /usr/share/keytrustee-server-keyhsm directory by default.

  2. Rename Previously-Created Configuration Files

    For Key HSM major version rollbacks, previously-created configuration files do not authenticate with the HSM and Key Trustee Server, so you must recreate these files by re-executing the setup and trust commands. First, navigate to the Key HSM installation directory and rename the applications.properties, keystore, and truststore files:

    cd /usr/share/keytrustee-server-keyhsm/
          mv application.properties application.properties.bak
          mv keystore keystore.bak
          mv truststore truststore.bak
  3. Initialize Key HSM
    Run the service keyhsm setup command in conjunction with the name of the target HSM distribution:
    sudo service keyhsm setup [keysecure|thales|luna]

    For more details, see Initializing Navigator Key HSM.

  4. Establish Trust Between Key HSM and the Key Trustee Server
    The Key HSM service must explicitly trust the Key Trustee Server certificate (presented during TLS handshake). To establish this trust, run the following command:
    sudo keyhsm trust /path/to/key_trustee_server/cert

    For more details, see Establish Trust from Key HSM to Key Trustee Server.

  5. Start the Key HSM Service
    Start the Key HSM service:
    sudo service keyhsm start
  6. Establish Trust Between Key Trustee Server and Key HSM
    Establish trust between the Key Trustee Server and the Key HSM by specifying the path to the private key and certificate:
    sudo ktadmin keyhsm --server https://keyhsm01.example.com:9090 \
            --client-certfile /etc/pki/cloudera/certs/mycert.crt \
            --client-keyfile /etc/pki/cloudera/certs/mykey.key --trust
    For a password-protected Key Trustee Server private key, add the --passphrase argument to the command (enter the password when prompted):
    sudo ktadmin keyhsm --passphrase \
              --server https://keyhsm01.example.com:9090 \
              --client-certfile /etc/pki/cloudera/certs/mycert.crt \
              --client-keyfile /etc/pki/cloudera/certs/mykey.key --trust

    For additional details, see Integrate Key HSM and Key Trustee Server.

  7. Remove Configuration Files From Previous Installation
    After completing the rollback, remove the saved configuration files from the previous installation:
    cd /usr/share/keytrustee-server-keyhsm/
           rm application.properties.bak
           rm keystore.bak
           rm truststore.bak

Rollback Navigator Encrypt

To roll back Cloudera Navigator Encrypt:
  1. If you have configured and are using an RSA master key file with OAEP padding, then you must revert this setting to its original value:
    navencrypt key --change
  2. Stop the Navigator Encrypt mount service:
    sudo /etc/init.d/navencrypt-mount stop
  3. Confirm that the mount-stop command completed:
    sudo /etc/init.d/navencrypt-mount status
  4. If rolling back to a release lower than NavEncrypt 6.2:
    1. a. Print the existing ACL rules and save that output to a file:
      sudo navencrypt acl --print+ vim acls.txt
    2. b. Delete all existing ACLs, for example, if there are a total of 7 ACL rules run:
      sudo navencrypt acl --del --line=1,2,3,4,5,6,7
  5. To fully downgrade Navigator Encrypt, manually downgrade all of the associated Navigator Encrypt packages (in the order listed):
    1. navencrypt
    2. (Only required for operating systems other than SLES) navencrypt-kernel-module
    3. (Only required for the SLES operating system) cloudera-navencryptfs-kmp-<kernel_flavor>
    Note: Replace kernel_flavor with the kernel flavor for your system. Navigator Encrypt supports the default, xen, and ec2 kernel flavors.d. libkeytrustee
  6. If rolling back to a release less than NavEncrypt 6.2
    1. Reapply the ACL rules:
      sudo navencrypt acl --add --file=acls.txt
  7. Recompute process signatures:
    sudo navencrypt acl --update
  8. Restart the Navigator Encrypt mount service
    sudo /etc/init.d/navencrypt-mount start

Rollback Solr

Due to on-disk format changes, Solr collections need to be restored from backup. Rollback of HDFS and ZooKeeper are NOT necessary when restoring individual collections.

For more information, see Restoring a Collection documentation.

Rollback Atlas

Rollback Atlas Solr Collections
Atlas has several collections in Solr that must be restored from the pre-upgrade backup - vertex_index, edge_index, and fulltext_index. These collections may already have been restored using the Rollback Solr documentation. If the collections are not yet restored, you must restore collections now using the Rollback Solr documentation.
Rollback Atlas HBase Tables
  1. From a client host, start the hbase shell hbase shell
  2. Within the hbase shell, list the snapshots, that must contain the pre-upgrade snapshots list_snapshots
  3. Within the hbase shell, disable the atlas_janus table, restore the snapshot, and enable the table disable 'atlas_janus' restore_snapshot '<name of atlas_janus snapshot from list_snapshots>' disable 'atlas_janus'
  4. Within the hbase shell, disable the ATLAS_ENTITY_AUDIT_EVENTS table, restore the snapshot, and enable the table disable 'ATLAS_ENTITY_AUDIT_EVENTS' restore_snapshot '<name of ATLAS_ENTITY_AUDIT_EVENTS snapshot from list_snapshots>' disable 'ATLAS_ENTITY_AUDIT_EVENTS'
  5. Restart Atlas.

Rollback YARN Queue Manager

You can rollback YARN Queue Manager using the pre-upgrade backup of config-service.mv.db and config-service.trace.db.

  1. Navigate to the YARN Queue Manager service in Cloudera Manager and record the configuration value for config_service_db_loc (or queuemanager_user_home_dir if blank) and the host where the YARN Queue Manager Store is running.
  2. Stop the YARN Queue Manager service.
  3. SSH to the YARN Queue Manager Store host and copy the pre-upgrade config-service.mv.db and config-service.trace.db to the config_service_db_loc obtained in the previous step.
  4. Start the YARN Queue Manager service.

Rollback Spark

You can rollback Spark in case of any failure during upgrading to Cloudera 7.3.1.

  1. Activate the 7.1.9 parcel on your cluster in Parcels > Activate.
  2. Close the Restart window with Close.
  3. The Spark 3 configuration issues have to be fixed after the parcel rollback:
    1. Delete the hdfs:// prefix in spark.eventLog.dir
    2. Delete the hdfs:// prefix in spark.driver.log.dfsDir
  4. Activate the SPARK3 parcel in Clusters > Status and in the Actions menu select Restart.
  5. Restart the services in Parcels > Activate.
  6. Deploy the Client Configuration. (See below.)
  7. Perform a post-rollback check by running the pi job on a Spark 3 gateway host:

    spark3-submit  --master yarn  --deploy-mode client  --class org.apache.spark.examples.SparkPi /opt/cloudera/parcels/SPARK3/lib/spark3/examples/jars/spark-examples_2.12.jar 100

Deploy the Client Configuration

  1. On the Cloudera Manager Home page, click the Actions menu and select Deploy Client Configuration.
  2. Click Deploy Client Configuration.

Post downgrade steps

Streams Replication Manager (SRM)
Reset the state of the internal Kafka Streams application. Run the following command on the hosts of the SRM Service role.
kafka-streams-application-reset \
         --bootstrap-servers [***SRM SERVICE HOST***] \
         --config-file [***PROPERTIES FILE***] \
         --application-id srm-service_v2

Replace [***PROPERTIES FILE***] with the location of a configuration file that contains all necessary security properties that are required to establish a connection with the Kafka service. This option is only required if your Kafka service is secured.

Cruise Control
  1. Insert the removed goal back to the relevant goal sets, but with the renamed goal name RackAwareGoal (Not RackAwareDistributionGoal)
  2. Restart Cruise Control
Finalize the HDFS Upgrade
This step should be performed only after all validation is completed. For more information, see Finalize the HDFS upgrade documentation.
Knox
Following the downgrade to 7.1.7 SP2, Knox may be found in an unhealthy state. In this case, perform a Rolling Restart of the Knox service using the Cloudera Manager UI.