Rollback from Cloudera Base on premises 7.3.1 or 7.3.2

Perform the following instructions to rollback a Cloudera Runtime cluster from versions 7.3.2 or 7.3.1 to a previous stable release (such as 7.3.1, 7.1.9 SP1, 7.1.9, CDP 7.1.8 cumulative hotfix 17, or 7.1.7 SP3). The process involves reverting the Cloudera Manager and Cloudera Management Services, stopping cluster services, activating the prior Runtime parcel, and restoring service-specific databases from pre-upgrade backups.

Rollback

Rollback restores the software to the prior release and also restores the data and metadata to the pre-upgrade state. Service interruptions are expected as the cluster must be halted. After HDFS and/or Ozone is finalized, it is not possible to Rollback.

The following components do not support Rollback. If these components are in use, manual restore or recovery may be necessary. Work with Cloudera Support to devise a plan based on the deployed components.

Ozone

Rollback Stages and Recovery Paths

To ensure a successful rollback, identify your current status based on the following upgrade stages:

Stage 1 - Baseline (Original Cloudera Manager and Cloudera Runtime): The original state with the previous Cloudera Manager and Cloudera Runtime versions. Use Backup_1 (taken before Stage 2) to return to this state.
Stage 2 - Intermediate (New Cloudera Manager and Original Cloudera Runtime): The state after a successful Cloudera Manager upgrade but before the Cloudera Runtime upgrade. This uses the new Cloudera Manager version with the old Cloudera Runtime. Use Backup_2 (taken after the Cloudera Manager upgrade) to return to this state.
Stage 3 - Target State (New Cloudera Manager & New Cloudera Runtime) — Upgrade Failure Point: If the Cloudera Runtime Upgrade (Stage 3) fails:
- Runtime Rollback (Revert to Stage 2): Downgrade the Cloudera Runtime parcel bits and restore the Cloudera Manager/Cloudera Management Service databases using Backup_2. This maintains the new Cloudera Manager version.
- Full Rollback (Revert to Stage 1): If you decide to revert the Cloudera Manager version as well, perform a Cloudera Manager package downgrade and restore the databases using Backup_1.

Pre rollback steps

important

JDK Compatibility and Java Home Configuration

Cloudera Manager 7.13.2 and Cloudera Runtime 7.3.2 require JDK 17 to function. Before you begin the rollback to an earlier Cloudera Runtime version (such as 7.1.x), you must reset the Java configuration to the previously supported version (JDK 8 or JDK 11).

Follow these steps to ensure JDK compatibility:

Update Global Java Home: Navigate to Hosts > All Hosts > Configuration. Update the Java Home Directory property to point to the older JDK location (JDK 8 or JDK 11). This ensures that the Cloudera Runtime services use the correct Java version after the rollback.
Configure Cloudera Management Service Safety Valve (If retaining Cloudera Manager 7.13.2): If the environment keeps Cloudera Manager at version 7.13.2 while rolling back the Runtime components, Cloudera Management Services still requires JDK 17.
1. Navigate to Cloudera Management Service > Configuration.
2. Locate the Service Environment Advanced Configuration Snippet (Safety Valve).
3. Define the JDK 17 path specifically within this snippet. This setting allows Cloudera Management Service to remain compatible with Cloudera Manager 7.13.2 while the rest of the cluster uses an older JDK.
  
  warning
  You must align the JDK version with the specific requirements of each service. Incorrect Java Home settings will prevent services from starting during the final stages of the rollback.

Ozone

This procedure is applicable only if you are downgrading from Cloudera Base on premises 7.3.2 or 7.3.1 to CDP Private Cloud Base 7.1.8.

Stop the Ozone Recon Web UI. Within Cloudera Manager UI, navigate to the Ozone service > Ozone Recon > Actions > Stop this Ozone Recon.
Navigate to Configuration within the Ozone service and collect the value of ozone.recon.db.dir (default value is /var/lib/hadoop-ozone/recon/data).
SSH to the Ozone Recon Web UI host and move the ozone.recon.db.dir parent directory to a backup location: mv /var/lib/hadoop-ozone/recon /var/lib/hadoop-ozone/recon-backup-CDP.

HBase

Stop the HBase Master(s). Execute knit as the hbase user if Kerberos is enabled.

Stop Omid within Cloudera Manager UI
Navigate to the HBase service > Instances within Cloudera Manager UI and note the hostname of the HBase Master instance(s). Login to the host(s) and execute the following: hbase master stop --shutDownCluster
Stop the remaining HBase components. Navigate to the HBase service within Cloudera Manager UI > Actions > Stop

The following must be performed when downgrading to CDP Private Cloud Base 7.1.7 SP2 from Cloudera Base on premises 7.3.2 or 7.3.1. You will need to kinit as the hbase user if kerberos is enabled.

Contact support for the appropriate hbck2 jar
Execute a dry run of the shortenTableinfo command and validate the appropriate files have been identified hbase --config /etc/hbase/conf hbck -j hbase-hbck2-X.Y.Z.jar shortenTableinfo
Run the shortenTableinfo -fix command to fix the file format hbase --config /etc/hbase/conf hbck -j hbase-hbck2-X.Y.Z.jar shortenTableinfo -fix

Cruise Control

The following steps are applicable only if you downgrade from Cloudera Base on premises 7.3.2 or 7.3.1 to CDP Private Cloud Base 7.1.7 SP2 and not from Cloudera Base on premises 7.3.2 or 7.3.1 to CDP Private Cloud Base 7.1.8. You can skip this section if the Cruise Control Goal configurations were set to the default values before performing the upgrade. However, if the Cruise Control Goal configuration values were changed before performing the upgrade, then you must proceed with this section.

In Cruise Control, you must rename com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal in Cloudera Manager > Clusters > Cruise Control > Configurations tab in every occurrences as described below during downgrade process.

In Cruise Control, from CDP Private Cloud Base 7.1.8 and higher, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal is renamed to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal.

Perform the below steps:

Check the following goal sets if RackAwareDistributionGoal is present (Cloudera Manager > Clusters > Cruise Control > Configurations tab):
1. default.goals
2. goals
3. self.healing.goals
4. hard.goals
5. anomaly.detection.goals
Create a note for yourself about where RackAwareDistributionGoal were present
Remove RackAwareDistributionGoal from all of the goal lists
Perform the Cloudera Runtime downgrade process

Stop the Cluster

On the Home> Status tab, click the Actions menu and select Stop.
Click Stop on the confirmation screen. The Command Details window shows the progress of the stopping process.
When the All services successfully stopped message appears, the task is complete and you can close the Command Details window.

Rolling back the Cloudera Runtime parcel

Navigate to Parcels within Cloudera Manager.
Locate the CDP Private Cloud Base 7.1.7 SP3/7.1.8/7.1.9 SP1/7.3.1 parcel and click Activate.
Follow the wizard and address any issues from the various inspectors.
note
Warnings are expected for HDFS and Ozone due to the unfinalized rolling upgrade, validate no other warnings exist.

The upgrade activates the parcel and restarts services.

Restore Post-Cloudera Manager Upgrade Database Dumps

To restore all service and host configurations to the point prior to the Cloudera Runtime upgrade, Cloudera recommends that you restore the Cloudera Manager and Cloudera Management Service configurations.

Stop the Cloudera Management Service from the Cloudera Manager Home page, see Stopping the Cloudera Management Service.
SSH into the Cloudera Manager Server host and stop the Cloudera Manager Server by running the following command:
```
sudo systemctl stop cloudera-scm-server
```
SSH into the Cloudera Manager Agent hosts to stop the Cloudera Manager Agent and supervisord services, and then clean the Cloudera Manager Agents and the supervisord process by running the following commands:
```
sudo systemctl stop cloudera-scm-agent.service
sudo systemctl stop cloudera-scm-supervisord.service
sudo rm -rf /var/run/cloudera-scm-agent /var/lib/cloudera-scm-agent/response.avro
```
Optional: Restoring External Cloudera Manager Databases
1. Gather required Database information.
  
  To restore the Cloudera Manager and Cloudera Management Service databases, you must use the information gathered during the corresponding backup process (either the Initial Baseline or the Intermediate State) to identify the required hostnames, users, and passwords.
  
  For the Cloudera Manager Database, you can find this information in the /etc/cloudera-scm-server/db.properties file. For other databases, you can locate all details (excluding passwords) on the running Cloudera Manager cluster and service configuration pages.
2. View the Properties File.
  
  You can view the properties file by running the following command:
  cat /etc/cloudera-scm-server/db.properties
  
  com.cloudera.cmf.db.type=<type>
  
  com.cloudera.cmf.db.host=<hostname>:<port>
  
  com.cloudera.cmf.db.name=<database>
  
  com.cloudera.cmf.db.user=<user>
  
  com.cloudera.cmf.db.password=<password>
3. Restore the Specific Database Type.
  For PostgreSQL Databases
  
  Depending on your database security configuration, you might need to switch to the postgres user to restore the database:
  su - postgres psql < dumpfile.sql
  
  For MySQL Databases
  
  To restore a MySQL database, you must run the following command:
  mysql -u<username> -p <database_name> < dumpfile.sql
  
  For MariaDB Databases
  
  Since all MariaDB versions support mysqldump backups, you can restore them using the following command:
  mariadb -u<username> -p <database_name> < dumpfile.sql
  
  For Oracle Databases
  
  For Oracle, you must work with your database administrator to ensure they restore the databases properly.
SSH to the Cloudera Manager host and check the /etc/cloudera-scm-server/db.properties file to verify that the Cloudera Manager database credentials point to the previous database installation.

To restore the Cloudera Management Service, SSH to all Cloudera Management Service hosts and restore the Cloudera Management Server directories by running the following commands:

sudo cp -rp /var/lib/cloudera-service-monitor-`date +%F`-CM /var/lib/cloudera-service-monitor
sudo cp -rp /var/lib/cloudera-host-monitor-`date +%F`-CM /var/lib/cloudera-host-monitor 
sudo cp -rp /var/lib/cloudera-scm-eventserver-`date +%F`-CM /var/lib/cloudera-scm-eventserver

SSH to the Cloudera Manager Server host and start the Cloudera Manager Server by running the following command:
```
sudo systemctl start cloudera-scm-server
```
SSH to the Cloudera Manager Agent hosts and start the Cloudera Manager Agent by running the following command:
```
sudo systemctl start cloudera-scm-agent
```
Start the Cloudera Management Service from the Cloudera Manager Home page, see Starting the Cloudera Management Service.

Alternative (UI-based Rollback):

You can also revert configurations by navigating to Hosts > All Hosts > Configuration > History & Rollback or by iterating through each service's Configuration > History & Rollback tab.

Rollback ZooKeeper

Rollback of the data and service is not expected to be necessary unless dependent services are being rolled back. If it is determined that a ZooKeeper rollback is necessary, the steps are:

Stop ZooKeeper
Restore the data backup. For example: cp -rp /var/lib/zookeeper-backup-pre-upgrade-CM-CDH /var/lib/zookeeper/
Start ZooKeeper

Rollback HDFS

You cannot roll back HDFS while high availability is enabled. The rollback procedure in this topic creates a temporary configuration without high availability. Regardless of whether high availability is enabled, follow the steps in this section.

Roll back all of the NameNodes and JournalNodes. Use the NameNode backup directory and JournalNode backup directory you created before upgrading to Cloudera Base on premises. (/etc/hadoop/conf.rollback.namenode) to perform the following steps on all NameNode hosts:
1. Start the JournalNodes using Cloudera Manager:
  1. Go to the HDFS service.
  2. Select the Instances tab.
  3. Select all JournalNode roles from the list.
  4. Click Actions for Selected > Start.
2. (Clusters with TLS enabled only) Edit the /etc/hadoop/conf.rollback.namenode/ssl-server.xml file on all NameNode hosts (located in the temporary rollback directory) and update the keystore passwords with the actual cleartext passwords. The passwords will have values that look like this:
```
<property>
             <name>ssl.server.keystore.password</name>
             <value>********</value>
             </property>
             <property>
             <name>ssl.server.keystore.keypassword</name>
             <value>********</value>
             </property>
```
3. (TLS only) Edit the /etc/hadoop/conf.rollback.namenode/ssl-server.xml file and remove the hadoop.security.credential.provider.path property.
4. (TLS only) Edit the /etc/hadoop/conf.rollback.namenode/ssl-server.xml file and change the value of ssl.server.keystore.location to /etc/hadoop/conf.rollback.namenode/cm-auto-host_keystore.jks
5. Edit the /etc/hadoop/conf.rollback.namenode/core-site.xml and change the value of the net.topology.script.file.name property to /etc/hadoop/conf.rollback.namenode. For example:
```
# Original property
<property>
<name>net.topology.script.file.name</name>
<value>/var/run/cloudera-scm-agent/process/63-hdfs-NAMENODE/topology.py</value>
</property>
```
```
# New property
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf.rollback.namenode/topology.py</value>
</property>
```
6. Edit the /etc/hadoop/conf.rollback.namenode/topology.py file and change the value of DATA_FILE_NAME to /etc/hadoop/conf.rollback.namenode. For example:
```
DATA_FILE_NAME = '/etc/hadoop/conf.rollback.namenode/topology.map'
```
7. (Kerberos enabled clusters only) Run the following command:
```
sudo -u hdfs kinit hdfs/<NameNode Host name> -l 7d -kt /etc/hadoop/conf.rollback.namenode/hdfs.keytab
```
8. Start active Namenode with the following command:
```
sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.namenode namenode -rollback
```
  or (if a rolling upgrade is performed)
```
sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.namenode namenode -rollingUpgrade rollback
```
  important
  The NameNode must automatically exit after the rollback is complete.
  Sample output:
  26/03/17 08:05:11 INFO namenode.FSImage: Rolling back storage directory /dataroot/ycloud/dfs/nn. new LV = -64; new CTime = 1773032675908 26/03/17 08:05:11 INFO namenode.NNUpgradeUtil: Rollback of /dataroot/ycloud/dfs/nn is complete.
9. Start the NameNode that is rolled back just now and make sure that it reaches the Active state.
  1. Sign in to Cloudera Manager.
  2. In the left navigation, click Clusters and click the HDFS cluster.
  3. Click the Instances tab.
  4. Select the check box next to the rolled back NameNode.
  5. Click Actions for Selected dropdown button and click Start.
10. Log in to the other Namenode and start SBNN with the following command:
```
sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.namenode namenode -bootstrapStandby
```
11. Press Yes when prompted. This also exits the process after it is done.
12. Select Ctrl + C for active Namenode to exit the process.
Rollback the DataNodes.
Use the DataNode rollback directory you created before upgrading to Cloudera Base on premises (/etc/hadoop/conf.rollback.datanode) to perform the following steps on all DataNode hosts:
1. (Clusters with TLS enabled only) Edit the /etc/hadoop/conf.rollback.datanode/ssl-server.xml file on all DataNode hosts (Located in the temporary rollback directory.) and update the keystore passwords (ssl.server.keystore.password and ssl.server.keystore.keypassword) with the actual passwords.
  The passwords will have values that look like this:
```
<property>
         <name>ssl.server.keystore.password</name>
         <value>********</value>
         </property>
         <property>
         <name>ssl.server.keystore.keypassword</name>
         <value>********</value>
         </property>
```
2. (TLS only) Edit the /etc/hadoop/conf.rollback.datanode/ssl-server.xml file and remove the hadoop.security.credential.provider.path property and change the value of property.
```
ssl.server.keystore.location to /etc/hadoop/conf.rollback.datanode/cm-auto-host_keystore.jks
```
3. Edit the /etc/hadoop/conf.rollback.datanode/hdfs-site.xml file and remove the dfs.datanode.max.locked.memory property.
4. If you kerberos enabled cluster then make sure change the value of hdfs.keytab to the absolute path of conf.rollback.datanode folder in core-site.xmland hdfs-site.xml
5. Run one of the following commands:
  - If the DataNode is running with privileged ports (usually 1004 and 1006):
```
cd /etc/hadoop/conf.rollback.datanode
         export HADOOP_SECURE_DN_USER=hdfs
         export JSVC_HOME=/opt/cloudera/parcels/<parcel_filename>/lib/bigtop-utils
         hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
```
  - If the DataNode is not running on privileged ports:
```
sudo hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
```
  When the rolling back of the DataNodes is complete, terminate the console session by typing Control-C. Look for output from the command similar to the following that indicates when the DataNode rollback is complete:
```
Rollback of /dataroot/ycloud/dfs/dn/current/BP-<Block Group number> is complete
```
  You may see the following error after issuing these commands:
```
ERROR datanode.DataNode: Exception in secureMain java.io.IOException: 
         The path component: '/var/run/hdfs-sockets' in '/var/run/hdfs-sockets/dn' has permissions 0755 uid 39998 and gid 1006. 
         It is not protected because it is owned by a user who is not root and not the effective user: '0'.
```
  The error message will also include the following command to run:
```
chown root /var/run/hdfs-sockets 
```
  After running this command, the DataNode will restart successfully. Rerun the DataNode rollback command:
```
sudo hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
```
6. If High Availability for HDFS is enabled, restart the HDFS service. In the Cloudera Manager Admin Console, go to the HDFS service and select Actions > Restart.
7. If high availability is not enabled for HDFS, use the Cloudera Manager Admin Console to restart all NameNodes and DataNodes.
  1. Go to the HDFS service.
  2. Select the Instances tab
  3. Select all DataNode and NameNode roles from the list.
  4. Click Actions for Selected > Restart.
If high availability is not enabled for HDFS, roll back the Secondary NameNode.
1. (Clusters with TLS enabled only) Edit the /etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml file on all Secondary NameNode hosts (Located in the temporary rollback directory.) and update the keystore passwords with the actual cleartext passwords. The passwords will have values that look like this:
```
<property> 
         <name>ssl.server.keystore.password</name> 
         <value>********</value> 
         </property> 
         <property> 
         <name>ssl.server.keystore.keypassword</name> 
         <value>********</value> 
         </property> 
        
```
2. (TLS only) Edit the /etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml file and remove the hadoop.security.credential.provider.path property and change the value of property ssl.server.keystore.location to
```
/etc/hadoop/conf.rollback.secondarynamenode/cm-auto-host_keystore.jks 
```
3. Log in to the Secondary NameNode host and run the following commands:
```
rm -rf /dfs/snn/*
        cd /etc/hadoop/conf.rollback.secondarynamenode/
        sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.secondarynamenode secondarynamenode -format
       
```
  When the rolling back of the Secondary NameNode is complete, terminate the console session by typing Control-C. Look for output from the command similar to the following that indicates when the Secondary NameNode rollback is complete:
```
2020-12-21 17:09:36,239 INFO namenode.SecondaryNameNode: Web server init done
        
```
Restart the HDFS service. Open the Cloudera Manager Admin Console, go to the HDFS service page, and select Actions > Restart.
The Restart Command page displays the progress of the restart. Wait for the page to display the Successfully restarted service message before continuing.

For more information on HDFS, see HDFS troubleshooting documentation.

Restore CDH Databases

Restore the following databases from the CDH backups in the specified order, one service at a time. Stop each service before restoring its database:

Ranger
Ranger KMS
Stream Messaging Manager
Schema Registry
Hue (only if you are rolling back to CDP Private Cloud Base 7.1.7 SP2)

Only start these services in their dependency order once HDFS and ZooKeeper are operational on the target rollback version.

The steps for backing up and restoring databases differ depending on the database vendor and version you select for your cluster and are beyond the scope of this document.

See the following vendor resources for more information:

MariaDB 10.5, 10.6, and 10.11: https://mariadb.com/kb/en/backup-and-restore-overview/
MySQL 8.0: https://dev.mysql.com/doc/refman/8.0/en/backup-and-recovery.html
MySQL 8.4: https://dev.mysql.com/doc/refman/8.4/en/backup-and-recovery.html
PostgreSQL 14: https://www.postgresql.org/docs/14/backup.html
PostgreSQL 15: https://www.postgresql.org/docs/15/backup.html
PostgreSQL 16: https://www.postgresql.org/docs/16/backup.html
PostgreSQL 17: https://www.postgresql.org/docs/17/backup.html
Oracle 21c: https://docs.oracle.com/en/database/oracle/oracle-database/21/index.html
Oracle 23c: https://docs.oracle.com/en/database/oracle/oracle-database/23/index.html

Roll Back Cloudera Navigator Encryption Components

If you are rolling back any encryption components (Key Trustee KMS, HSM KMS, Key HSM, or Navigator Encrypt), first refer to:

Start the Key Management Server

Restart the Key Management Server. Open the Cloudera Manager Admin Console, go to the KMS service page, and select Actions > Start.

Roll Back Key HSM

To roll back Key HSM:

Install the version of Navigator Key HSM to which you wish to roll back
Install the Navigator Key HSM package using yum:
```
sudo yum downgrade keytrustee-keyhsm
```
Cloudera Navigator Key HSM is installed to the /usr/share/keytrustee-server-keyhsm directory by default.
Rename Previously-Created Configuration Files
For Key HSM major version rollbacks, previously-created configuration files do not authenticate with the HSM and Key Trustee Server, so you must recreate these files by re-executing the setup and trust commands. First, navigate to the Key HSM installation directory and rename the applications.properties, keystore, and truststore files:
```
cd /usr/share/keytrustee-server-keyhsm/
mv application.properties application.properties.bak
mv keystore keystore.bak
mv truststore truststore.bak
```
Initialize Key HSM
Run the service keyhsm setup command in conjunction with the name of the target HSM distribution:
```
sudo service keyhsm setup [keysecure|thales|luna]
```
For more details, see Initializing Navigator Key HSM.
Establish Trust Between Key HSM and the Key Trustee Server
The Key HSM service must explicitly trust the Key Trustee Server certificate (presented during TLS handshake). To establish this trust, run the following command:
```
sudo keyhsm trust /path/to/key_trustee_server/cert
```
For more details, see Establish Trust from Key HSM to Key Trustee Server.
Start the Key HSM Service
Start the Key HSM service:
```
sudo service keyhsm start
```
Establish Trust Between Key Trustee Server and Key HSM
Establish trust between the Key Trustee Server and the Key HSM by specifying the path to the private key and certificate:
```
sudo ktadmin keyhsm --server https://keyhsm01.example.com:9090 \
--client-certfile /etc/pki/cloudera/certs/mycert.crt \
--client-keyfile /etc/pki/cloudera/certs/mykey.key --trust
```
For a password-protected Key Trustee Server private key, add the --passphrase argument to the command (enter the password when prompted):
```
sudo ktadmin keyhsm --passphrase \
--server https://keyhsm01.example.com:9090 \
--client-certfile /etc/pki/cloudera/certs/mycert.crt \
--client-keyfile /etc/pki/cloudera/certs/mykey.key --trust
```
For additional details, see Integrate Key HSM and Key Trustee Server.
Remove Configuration Files From Previous Installation
After completing the rollback, remove the saved configuration files from the previous installation:
```
cd /usr/share/keytrustee-server-keyhsm/
rm application.properties.bak
rm keystore.bak
rm truststore.bak
```

Roll Back Navigator Encrypt

To roll back Cloudera Navigator Encrypt:

If you have configured and are using an RSA master key file with OAEP padding, then you must revert this setting to its original value:
```
navencrypt key --change
```
Stop the Navigator Encrypt mount service:
```
sudo /etc/init.d/navencrypt-mount stop
```
Confirm that the mount-stop command completed:
```
sudo /etc/init.d/navencrypt-mount status
```
If rolling back to a release lower than NavEncrypt 6.2:
1. a. Print the existing ACL rules and save that output to a file:
```
sudo navencrypt acl --print+ vim acls.txt
```
2. b. Delete all existing ACLs, for example, if there are a total of 7 ACL rules run:
```
sudo navencrypt acl --del --line=1,2,3,4,5,6,7
```
To fully downgrade Navigator Encrypt, manually downgrade all of the associated Navigator Encrypt packages (in the order listed):
1. navencrypt
2. (Only required for operating systems other than SLES) navencrypt-kernel-module
3. (Only required for the SLES operating system) cloudera-navencryptfs-kmp-<kernel_flavor>
Note: Replace kernel_flavor with the kernel flavor for your system. Navigator Encrypt supports the default, xen, and ec2 kernel flavors.d. libkeytrustee
If rolling back to a release less than NavEncrypt 6.2
1. Reapply the ACL rules:
```
sudo navencrypt acl --add --file=acls.txt
```
Recompute process signatures:
```
sudo navencrypt acl --update
```
Restart the Navigator Encrypt mount service
```
sudo /etc/init.d/navencrypt-mount start
```

Start HBase

You might encounter other errors when starting HBase (for example, replication-related problems, region assignment-related issues, and meta region assignment problems). In this case, you must delete the znode in ZooKeeper and then start HBase again. (This deletes the replication peer information and you need to re-configure your replication schedules)

In Cloudera Manager, look up the value of the zookeeper.znode.parent property. The default value is /hbase.
Connect to the ZooKeeper ensemble by running the following command from any HBase gateway host.
```
zookeeper-client -server zookeeper_ensemble
```
To find the value to use for zookeeper_ensemble, open the /etc/hbase/conf.cloudera.<HBase service name>/hbase-site.xml file on any HBase gateway host. Use the value of the hbase.zookeeper.quorum property.
note
If you have deployed a secure cluster, you must connect to ZooKeeper using a client jaas.conf file. You can find such a file in an HBase process directory (/var/run/cloudera-scm-agent/process/).

Specify the jaas.conf using the JVM flags by running the following commands in the ZooKeeper client.

CLIENT_JVMFLAGS= 
 "-Djava.security.auth.login.config=/var/run/cloudera-scm 
agent/process/HBase_process_directory/jaas.conf" 
zookeeper-client -server <zookeeper_ensemble>

The ZooKeeper command-line interface opens.

Enter the following command.
```
rmr /hbase
```
If you have deployed a secure cluster, enter the following command:
```
deleteall /hbase
```
If you see the message Node not empty: /hbase/tokenauth, you must re-run the same command and restart the HBase service.
Restart the HBase service.

After HBase is healthy, ensure that you restore the states of the Balancer and Normalizer (enable them if they were enabled before the rollback). Also re-enable the Merge and Split operations you disabled before the rollback to avoid the Master Procedure incompatibility problem.

Run the following commands in HBase Shell:

balance_switch true
normalizer_switch true
splitormerge_switch 'SPLIT', true
splitormerge_switch 'MERGE', true

Rollback Solr

Start the HDFS and ZooKeeper services.
Restore the Solr-specific znodes in ZooKeeper using the snapshot you took before the upgrade.
note
This step is only necessary if Solr is rolled back independent of other components. If ZooKeeper is rolled back as well, skip this step.

Start the Solr service.

note

If the state of one or more Solr core is down and the Solr log contains a similar error message:

"org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
       org.apache.solr.store.hdfs.HdfsLockFactory"

it is necessary to clean up the HDFS locks in the index directories.

On all affected Solr nodes perform the following steps:

Stop the Solr node using Cloudera Manager.

Remove the HdfsDirectory@[***HEX ID***]-write.lock file from the index directory.

hdfs dfs -rm "/solr/[***COLLECTION NAME***]/[***CORE***]/data/[***INDEX DIRECTORY NAME***]/HdfsDirectory@[***HEX ID***] 
           lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@[***HEX ID***]-write.lock"

For example:

hdfs dfs -rm "/solr/testCollection/core_node1/data/index/HdfsDirectory@5d07feac
            lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@7df08aad-write.lock"

Start the Solr node using Cloudera Manager.

Restore the Solr collections using the backups you created before starting the upgrade. For more information, see Restoring a Solr collection.

note
Manually restoring collections on HDFS is only necessary if you roll back Solr independent of other components. If HDFS is rolled back as well, this is not necessary. You always need to manually restore collections that were stored on the local file system.
Restart Lily HBase Indexer (ks_indexer).

Rollback Atlas

Rollback Atlas Solr Collections

Atlas has several collections in Solr that must be restored from the pre-upgrade backup - vertex_index, edge_index, and fulltext_index. These collections may already have been restored using the Rollback Solr documentation. If the collections are not yet restored, you must restore collections now using the Rollback Solr documentation.

Rollback Atlas HBase Tables

From a client host, start the HBase shell hbase shell
Within the HBase shell, list the snapshots, that must contain the pre-upgrade snapshots list_snapshots
Within the HBase shell, disable the atlas_janus table, restore the snapshot, and enable the table
disable 'atlas_janus'
restore_snapshot '<name of atlas_janus snapshot from list_snapshots>'
enable 'atlas_janus'
Within the HBase shell, disable the ATLAS_ENTITY_AUDIT_EVENTS table, restore the snapshot, and enable the table
disable 'ATLAS_ENTITY_AUDIT_EVENTS'
restore_snapshot '<name of ATLAS_ENTITY_AUDIT_EVENTS snapshot from list_snapshots>'
enable 'ATLAS_ENTITY_AUDIT_EVENTS'
Restart Atlas.

Rollback Tez

After rolling back Tez, the tez.lib.uris property in tez-site.xml still points to libraries from a newer version. This causes issues with query processing.

To address this issue, you need to run "Upload Tez tar file to HDFS" from the Tez service's Actions menu. This updates the configuration and ensures tez.lib.uris points to the correct version.

Go to the TEZ service.
Select Actions > Upload Tez tar file to HDFS and click Upload Tez tar file to HDFS to confirm.
Restart the Hive on Tez service to apply the updated configuration

Rollback Kudu

Rollback depends on which backup method was used. There are two forms of backup/restore in Kudu. Spark job to create or restore a full/incremental backup or backup the entire Kudu node and restore it later. Restoring the Kudu data differs between the Spark job and full node backup approaches.

See the Kudu backup and restore docs for data recovery steps. The OS level restore is not recommended if there is a backup.

Rollback YARN Queue Manager

You can rollback YARN Queue Manager using the pre-upgrade backup of config-service.mv.db and config-service.trace.db.

Navigate to the YARN Queue Manager service in Cloudera Manager and record the configuration value for config_service_db_loc (or queuemanager_user_home_dir if blank) and the host where the YARN Queue Manager Store is running.
Stop the YARN Queue Manager service.
SSH to the YARN Queue Manager Store host and copy the pre-upgrade config-service.mv.db and config-service.trace.db to the config_service_db_loc obtained in the previous step.
Start the YARN Queue Manager service.

Rollback Kafka

You can rollback Kafka as long as the following criteria are met.

The log format Kafka properties inter.broker.protocol.version and log.message.format.version are set before the upgrade and are not removed or cleared after the upgrade.
There is a backup with the pre-upgrade state of the cluster from where you can restore the data, such as a remote cluster replicated with SRM. For guidance on restoring data from an SRM-replicated cluster, contact Cloudera support.

Rollback Spark

You can rollback Spark in case of any failure during upgrading to Cloudera Base on premises 7.3.2 or 7.3.1.

Activate the CDP Private Cloud Base 7.1.9 parcel on your cluster in Parcels > Activate.
Close the Restart window with Close.
The Spark 3 configuration issues have to be fixed after the parcel rollback:
1. Delete the hdfs:// prefix in spark.eventLog.dir
2. Delete the hdfs:// prefix in spark.driver.log.dfsDir
Activate the SPARK3 parcel in Clusters > Status and in the Actions menu select Restart.
Restart the services in Parcels > Activate.
Deploy the Client Configuration. (See below.)

Perform a post-rollback check by running the pi job on a Spark 3 gateway host:

spark3-submit  --master yarn  --deploy-mode client  --class org.apache.spark.examples.SparkPi /opt/cloudera/parcels/SPARK3/lib/spark3/examples/jars/spark-examples_2.12.jar 100

Deploy the Client Configuration

Go to the Home > Status tab.
Select Actions > Deploy Client Configuration.
Click Deploy Client Configuration. For more information about client configuration files, see Client Configuration Files.

Restart the Cluster

You must restart the cluster using the following steps.

On the Cloudera Manager Home page, click the Actions menu and select Restart.
Click Restart which appears on the next screen to confirm. If you have enabled high availability for HDFS, you can choose Rolling Restart instead to minimize cluster downtime. The Command Details window shows the progress of stopping services.
When All services successfully started appears, the task is complete and you can close the Command Details window.

Restore Cloudera Manager and Cloudera Management Service

Full Rollback (Revert to Initial Baseline State)

Revert your environment to the Stage 1: Initial Baseline State by downgrading Cloudera Manager packages, restoring the original database backups, and regenerating Kerberos credentials to ensure full system compatibility and stability.

Stop the cluster from the Cloudera Manager home page, see Stopping a Cluster
Stop the Cloudera Management Service from the Cloudera Manager home page, see Stopping the Cloudera Management Service.
SSH into the Cloudera Manager Server host and stop the Cloudera Manager Server by running the following command:
```
sudo systemctl stop cloudera-scm-server
```
SSH into the Cloudera Manager Agent hosts to stop the Cloudera Manager Agent and supervisord services, and then clean the Cloudera Manager Agents and the supervisord process by running the following commands:
```
sudo systemctl stop cloudera-scm-agent.service
sudo systemctl stop cloudera-scm-supervisord.service
sudo rm -rf /var/run/cloudera-scm-agent /var/lib/cloudera-scm-agent/response.avro
```
Restore the Cloudera Manager OS package repository setting by using SSH to log into the Cloudera Manager host, then edit the /etc/yum.repos.d/ file containing the Cloudera Manager repository to point to the pre-upgrade location.
Restoring the previous version of Cloudera Manager Server
important
Ensure that you install the required Cloudera Manager OS package version and reinstall the cloudera-scm-agent and cloudera-scm-daemons packages to match the server version.
1. Remove the current version by running the following command:
  sudo yum remove cloudera-scm-server
2. Install the specific pre-upgrade version:
  sudo yum install cloudera-scm-server-<pre_upgrade version>
  Example:
  - RHEL:
    sudo yum install cloudera-scm-server-7.11.3-12345678.el8
  - Ubuntu/Debian:
    sudo apt install cloudera-scm-server=7.11.3-12345678.el8
Restoring External Cloudera Manager Databases
1. Gather required Database information.
  
  To restore the Cloudera Manager and Cloudera Management Service databases, you must use the information gathered during the corresponding backup process (either the Initial Baseline or the Intermediate State) to identify the required hostnames, users, and passwords.
  
  For the Cloudera Manager Database, you can find this information in the /etc/cloudera-scm-server/db.properties file. For other databases, you can locate all details (excluding passwords) on the running Cloudera Manager cluster and service configuration pages.
2. View the Properties File.
  
  You can view the properties file by running the following command:
  cat /etc/cloudera-scm-server/db.properties
  
  com.cloudera.cmf.db.type=<type>
  
  com.cloudera.cmf.db.host=<hostname>:<port>
  
  com.cloudera.cmf.db.name=<database>
  
  com.cloudera.cmf.db.user=<user>
  
  com.cloudera.cmf.db.password=<password>
3. Restore the Specific Database Type.
  For PostgreSQL Databases
  
  Depending on your database security configuration, you might need to switch to the postgres user to restore the database:
  su - postgres psql < dumpfile.sql
  
  For MySQL Databases
  
  To restore a MySQL database, you must run the following command:
  mysql -u<username> -p <database_name> < dumpfile.sql
  
  For MariaDB Databases
  
  Since all MariaDB versions support mysqldump backups, you can restore them using the following command:
  mariadb -u<username> -p <database_name> < dumpfile.sql
  
  For Oracle Databases
  
  For Oracle, you must work with your database administrator to ensure they restore the databases properly.
SSH to the Cloudera Manager host and check the /etc/cloudera-scm-server/db.properties file to verify that the Cloudera Manager database credentials point to the previous database installation.

To restore the Cloudera Management Service, SSH to all Cloudera Management Service hosts and restore the Cloudera Management Server directories by running the following commands:

sudo cp -rp /var/lib/cloudera-service-monitor-`date +%F`-CM /var/lib/cloudera-service-monitor
sudo cp -rp /var/lib/cloudera-host-monitor-`date +%F`-CM /var/lib/cloudera-host-monitor 
sudo cp -rp /var/lib/cloudera-scm-eventserver-`date +%F`-CM /var/lib/cloudera-scm-eventserver

SSH to the Cloudera Manager Server host and start the Cloudera Manager Server by running the following command:
```
sudo systemctl start cloudera-scm-server
```
SSH to the Cloudera Manager Agent hosts and start the Cloudera Manager Agent by running the following command:
```
sudo systemctl start cloudera-scm-agent
```
Enable Kerberos Credential Deletion
1. Log in to the Cloudera Manager Server and navigate to Administration > Settings.
2. Select the Kerberos category.
3. Locate the Active Directory Delete Accounts on Credential Regeneration property and ensure you select it.
Regenerate Keytabs
1. Log in to the Cloudera Manager Server and navigate to Administration > Security.
2. Go to the Kerberos Credentials tab.
3. Select all principals by selecting the checkbox in the table header.
4. Click the link text All entries on this page are selected (select the remaining X) to ensure you select all principals across all pages.
5. Click Regenerate Selected.
Start the Cloudera Management Service from the Cloudera Manager home page, see Starting the Cloudera Management Service.
Deploy Client Configurations
1. Go to the Home > Status tab.
2. Select Actions > Deploy Client Configuration.
3. Click Deploy Client Configuration. For more information about client configuration files, see Client Configuration Files.
Start the Restored Cluster
You can now start the services in their original pre-upgrade state.
1. Go to the Home > Status tab.
2. Select Actions > Start, to start all services in the cluster.
3. Verify that all service roles (HDFS, YARN, Hive, etc.) start successfully and confirm that the displayed Cloudera Manager version matches your baseline version.

Post rollback steps

Streams Replication Manager (SRM)

Reset the state of the internal Kafka Streams application. Run the following command on the hosts of the SRM Service role.

kafka-streams-application-reset \
        --bootstrap-servers [***SRM SERVICE HOST***] \
        --config-file [***PROPERTIES FILE***] \
        --application-id srm-service_v2

Replace [***PROPERTIES FILE***] with the location of a configuration file that contains all necessary security properties that are required to establish a connection with the Kafka service. This option is only required if your Kafka service is secured.

Cruise Control

Insert the removed goal back to the relevant goal sets, but with the renamed goal name RackAwareGoal (Not RackAwareDistributionGoal)
Restart Cruise Control

Oozie

Execute the Install Oozie ShareLib action through Cloudera Manager:

Go to the Oozie service.
Select Actions > Install Oozie ShareLib.

Finalize the HDFS Upgrade

This step should be performed only after all validation is completed. For more information, see Finalize the HDFS upgrade documentation.

Runtime upgrade

When the parcel is activated, click the Actions menu next to the cluster name and select Post Cloudera Runtime Upgrade.