Procedure to Rollback from CDP 7.1.9
Perform the below procedure to rollback your cluster from 7.1.9 to 7.1.8 or 7.1.7 SP3 or 7.1.7 SP2 or 7.1.7 SP1.
Rollback
Rollback restores the software to the prior release and also restores the data and metadata to the pre-upgrade state. Service interruptions are expected as the cluster must be halted. After HDFS and/or Ozone is finalized, it is not possible to Rollback.
- Ozone
Pre rollback steps
- Ozone
This procedure is applicable only if you are downgrading from CDP 7.1.9 to CDP 7.1.8.
- Stop the Ozone Recon Web UI. Within Cloudera Manager UI, navigate to the .
- Navigate to Configuration within the Ozone service and collect the value of ozone.recon.db.dir (default value is /var/lib/hadoop-ozone/recon/data).
- SSH to the Ozone Recon Web UI host and move the ozone.recon.db.dir parent directory to a backup location: mv /var/lib/hadoop-ozone/recon /var/lib/hadoop-ozone/recon-backup-CDP.
- HBase
-
Stop the HBase Master(s). Execute knit as the hbase user if kerberos is enabled.
- Stop Omid within Cloudera Manager UI
- Navigate to the HBase service > Instances within Cloudera Manager UI and note the hostname of the HBase Master instance(s). Login to the host(s) and execute the following: hbase master stop --shutDownCluster
- Stop the remaining HBase components. Navigate to the HBase service within
- Cruise Control
-
The following steps are applicable only if you downgrade from 7.1.9 to 7.1.7 SP2 and not from 7.1.9 to 7.1.8. You can skip this section if the Cruise Control Goal configurations were set to the default values before performing the upgrade. However, if the Cruise Control Goal configuration values were changed before performing the upgrade, then you must proceed with this section.
In Cruise Control, you must rename com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal in Cloudera Manager > Clusters > Cruise Control > Configurations tab in every occurrences as described below during downgrade process.
In Cruise Control, from 7.1.8 and higher, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal is renamed to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal.
Perform the below steps:
- Check the following goal sets if RackAwareDistributionGoal is
present (Cloudera Manager > Clusters >
Cruise Control > Configurations tab):
- default.goals
- goals
- self.healing.goals
- hard.goals
- anomaly.detection.goals
- Create a note for yourself about where RackAwareDistributionGoal were present
- Remove RackAwareDistributionGoal from all of the goal lists
- Perform the runtime downgrade process
- Check the following goal sets if RackAwareDistributionGoal is
present (Cloudera Manager > Clusters >
Cruise Control > Configurations tab):
Stop the Cluster
- On the Home> Status tab, click the Actions menu and select Stop.
Click Stop in the confirmation screen. The Command Details window shows the progress of the stopping process.
When the All services successfully stopped message appears, the task is complete and you can close the Command Details window.
Rolling back the Runtime parcel
- Navigate to Parcels within Cloudera Manager.
- Locate the CDP Private Cloud Base 7.1.7 SP2/7.1.8 parcel and click Upgrade.
- Follow the wizard and address any issues from the various inspectors.
The upgrade activates the parcel and restarts services.
Restore CDH Databases
- Ranger
- Ranger KMS
- Stream Messaging Manager
- Schema Registry
- Hue (only if you are rolling back to 7.1.7 SP2)
The steps for backing up and restoring databases differ depending on the database vendor and version you select for your cluster and are beyond the scope of this document.
- MariaDB 10.2, 10.3, 10.4 and 10.5:https://mariadb.com/kb/en/backup-and-restore-overview/
- MySQL 5.7:https://dev.mysql.com/doc/refman/5.7/en/backup-and-recovery.html
- MySQL 8:https://dev.mysql.com/doc/refman/8.0/en/backup-and-recovery.html
- PostgreSQL 10:https://www.postgresql.org/docs/10/backup.html
- PostgreSQL 11:https://www.postgresql.org/docs/11/backup.html
- PostgreSQL 12:https://www.postgresql.org/docs/12/backup.html
- PostgreSQL 13:https://www.postgresql.org/docs/13/backup.html
- PostgreSQL 14:https://www.postgresql.org/docs/14/backup.html
- Oracle 19c:https://docs.oracle.com/en/database/oracle/oracle-database/19/index.html
Roll Back Cloudera Navigator Encryption Components
Roll Back Key Trustee Server
To roll back Key Trustee Server, replace the currently used parcel (for example, the parcel for version 7.1.9) with the parcel for the version to which you wish to roll back (for example, version 7.1.8). See Parcels for detailed instructions on using parcels.
- Open the Cloudera Manager Admin Console and go to the Key Trustee Server service. If you see that Key Trustee Server has stale configurations, click the yellow or blue button and follow the prompts.
- Make sure that the Keytrustee Server database roles are stopped. Then rename the
folder containing Keytrustee Postgres database data (both on master and slave hosts):
mv /var/lib/keytrustee/db /var/lib/keytrustee/db-14_2
- Open the Cloudera Manager Admin Console and go to the Key Trustee Server service.
- Select the Instances tab.
- Select the Active Database role type.
- Click .
- Click Set Up the Key Trustee Server Database to confirm.
Cloudera Manager sets up the Key Trustee Server database.
- Start the PostgreSQL
server:
sudo ktadmin db --start --pg-rootdir /var/lib/keytrustee/db --background
- On the master KTS node: running as user
keytrustee
,restore the keytrustee database on active hosts by running the following commands:
If the zip file is encrypted, you are prompted for the password to decrypt the file.sudo -su keytrustee export PATH=$PATH:/opt/cloudera/parcels/KEYTRUSTEE_SERVER/PG_DB/opt/postgres/12.1/bin/ export LD_LIBRARY_PATH=/opt/cloudera/parcels/KEYTRUSTEE_SERVER/PG_DB/opt/postgres/12.1/lib/ unzip -p keytrustee-db.zip | psql -p 11381 -d keytrustee
- Restore the Key Trustee Server configuration directory, on Active
hosts:
If the zip file is encrypted, you are prompted for the password to decrypt the file.su - keytrustee cd /var/lib/keytrustee unzip keytrustee-conf.zip
- Stop the PostgreSQL server : Change the login user to root and run the
command:
sudo ktadmin db --stop --pg-rootdir /var/lib/keytrustee/db
- Remove the backup files (keytrustee-db.zip and keytrustee-conf.zip) from the Key Trustee
Server’s host.
su - keytrustee cd /var/lib/keytrustee rm keytrustee-conf.zip rm keytrustee-db.zip
- Start the Active Database role in Cloudera Manager by clicking .
- Click Start to confirm.
- Select the Active Database.
- Click .
- Start the Passive Database instance: select the Passive Database, click .
- In the Cloudera Manager Admin Console, start the active KTS instance.
- In the Cloudera Manager Admin Console, start the passive KTS instance.
Start the Key Management Server
Restart the Key Management Server. Open the Cloudera Manager Admin Console, go to the KMS service page, and select
.Roll Back Key HSM
- Install the version of Navigator Key HSM to which you wish to roll
backInstall the Navigator Key HSM package using
yum
:sudo yum downgrade keytrustee-keyhsm
Cloudera Navigator Key HSM is installed to the
/usr/share/keytrustee-server-keyhsm
directory by default. - Rename Previously-Created Configuration Files
For Key HSM major version rollbacks, previously-created configuration files do not authenticate with the HSM and Key Trustee Server, so you must recreate these files by re-executing the
setup
andtrust
commands. First, navigate to the Key HSM installation directory and rename theapplications.properties
,keystore
, andtruststore
files:cd /usr/share/keytrustee-server-keyhsm/ mv application.properties application.properties.bak mv keystore keystore.bak mv truststore truststore.bak
- Initialize Key HSMRun the
service keyhsm setup
command in conjunction with the name of the target HSM distribution:sudo service keyhsm setup [keysecure|thales|luna]
For more details, see Initializing Navigator Key HSM.
- Establish Trust Between Key HSM and the Key Trustee ServerThe Key HSM service must explicitly trust the Key Trustee Server certificate (presented during TLS handshake). To establish this trust, run the following command:
sudo keyhsm trust /path/to/key_trustee_server/cert
For more details, see Establish Trust from Key HSM to Key Trustee Server.
- Start the Key HSM ServiceStart the Key HSM service:
sudo service keyhsm start
- Establish Trust Between Key Trustee Server and Key HSM
Establish trust between the Key Trustee Server and the Key HSM by specifying the path to the private key and certificate:
sudo ktadmin keyhsm --server https://keyhsm01.example.com:9090 \ --client-certfile /etc/pki/cloudera/certs/mycert.crt \ --client-keyfile /etc/pki/cloudera/certs/mykey.key --trust
For a password-protected Key Trustee Server private key, add the--passphrase
argument to the command (enter the password when prompted):sudo ktadmin keyhsm --passphrase \ --server https://keyhsm01.example.com:9090 \ --client-certfile /etc/pki/cloudera/certs/mycert.crt \ --client-keyfile /etc/pki/cloudera/certs/mykey.key --trust
For additional details, see Integrate Key HSM and Key Trustee Server.
- Remove Configuration Files From Previous InstallationAfter completing the rollback, remove the saved configuration files from the previous installation:
cd /usr/share/keytrustee-server-keyhsm/ rm application.properties.bak rm keystore.bak rm truststore.bak
Roll Back Navigator Encrypt
- If you have configured and are using an RSA master key file with OAEP padding, then
you must revert this setting to its original
value:
navencrypt key --change
- Stop the Navigator Encrypt mount
service:
sudo /etc/init.d/navencrypt-mount stop
- Confirm that the mount-stop command
completed:
sudo /etc/init.d/navencrypt-mount status
- If rolling back to a release lower than NavEncrypt 6.2:
- a. Print the existing ACL rules and save that output to a
file:
sudo navencrypt acl --print+ vim acls.txt
- b. Delete all existing ACLs, for example, if there are a total of 7 ACL rules
run:
sudo navencrypt acl --del --line=1,2,3,4,5,6,7
- a. Print the existing ACL rules and save that output to a
file:
- To fully downgrade Navigator Encrypt, manually downgrade all of the associated
Navigator Encrypt packages (in the order listed):
- navencrypt
- (Only required for operating systems other than SLES) navencrypt-kernel-module
- (Only required for the SLES operating system) cloudera-navencryptfs-kmp-<kernel_flavor>
- If rolling back to a release less than NavEncrypt 6.2
- Reapply the ACL
rules:
sudo navencrypt acl --add --file=acls.txt
- Reapply the ACL
rules:
- Recompute process
signatures:
sudo navencrypt acl --update
- Restart the Navigator Encrypt mount
service
sudo /etc/init.d/navencrypt-mount start
Rollback ZooKeeper
- Stop ZooKeeper
- Restore the data backup. For example: cp -rp /var/lib/zookeeper-backup-pre-upgrade-CM-CDH /var/lib/zookeeper/
- Start ZooKeeper
Rollback HDFS
You cannot roll back HDFS while high availability is enabled. The rollback procedure in this topic creates a temporary configuration without high availability. Regardless of whether high availability is enabled, follow the steps in this section.
- Roll back all of the NameNodes. Use the NameNode backup directory you
created before upgrading to Cloudera Private Cloud Base.
(
/etc/hadoop/conf.rollback.namenode
) to perform the following steps on all NameNode hosts:- Start the JournalNodes using Cloudera Manager:
- Go to the HDFS service.
- Select the Instances tab.
- Select all JournalNode roles from the list.
- Click .
- (Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.namenode/ssl-server.xml
file on all NameNode hosts (located in the temporary rollback directory) and update the keystore passwords with the actual cleartext passwords. The passwords will have values that look like this:<property> <name>ssl.server.keystore.password</name> <value>********</value> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>********</value> </property>
- (TLS only) Edit the
/etc/hadoop/conf.rollback.namenode/ssl-server.xml
file and remove the hadoop.security.credential.provider.path property. - (TLS only) Edit the
/etc/hadoop/conf.rollback.namenode/ssl-server.xml
file and change the value of ssl.server.keystore.location to /etc/hadoop/conf.rollback.namenode/cm-auto-host_keystore.jks - Edit the
/etc/hadoop/conf.rollback.namenode/core-site.xml
and change the value of the net.topology.script.file.name property to/etc/hadoop/conf.rollback.namenode
. For example:# Original property <property> <name>net.topology.script.file.name</name> <value>/var/run/cloudera-scm-agent/process/63-hdfs-NAMENODE/topology.py</value> </property>
# New property <property> <name>net.topology.script.file.name</name> <value>/etc/hadoop/conf.rollback.namenode/topology.py</value> </property>
- Edit the
/etc/hadoop/conf.rollback.namenode/topology.py
file and change the value of DATA_FILE_NAME to/etc/hadoop/conf.rollback.namenode
. For example:DATA_FILE_NAME = '/etc/hadoop/conf.rollback.namenode/topology.map'
- (Kerberos enabled clusters only) Run the following command:
sudo -u hdfs kinit hdfs/<NameNode Host name> -l 7d -kt /etc/hadoop/conf.rollback.namenode/hdfs.keytab
- Start active Namenode with the following
command:
sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.namenode namenode -rollback
- Log in to the other Namenode and start SBNN with the following
command:
sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.namenode namenode -bootstrapStandby
- Press Yes when prompted. This also exits the process after it is done.
- Select Ctrl + C for active Namenode to exit the process.
- Start the JournalNodes using Cloudera Manager:
- Rollback the DataNodes. Use the DataNode rollback directory you created before upgrading to Cloudera Private Cloud Base (
/etc/hadoop/conf.rollback.datanode
) to perform the following steps on all DataNode hosts:- (Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.datanode/ssl-server.xml
file on all DataNode hosts (Located in the temporary rollback directory.) and update the keystore passwords (ssl.server.keystore.password
andssl.server.keystore.keypassword
) with the actual passwords.The passwords will have values that look like this:<property> <name>ssl.server.keystore.password</name> <value>********</value> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>********</value> </property>
- (TLS only) Edit the
/etc/hadoop/conf.rollback.datanode/ssl-server.xml
file and remove thehadoop.security.credential.provider.path
property and change the value of property.ssl.server.keystore.location to /etc/hadoop/conf.rollback.datanode/cm-auto-host_keystore.jks
- Edit the
/etc/hadoop/conf.rollback.datanode/hdfs-site.xml
file and remove thedfs.datanode.max.locked.memory
property. -
If you kerberos enabled cluster then make sure change the value of hdfs.keytab to the absolute path of conf.rollback.datanode folder in core-site.xmland hdfs-site.xml
- Run one of the following commands:
- If the DataNode is running with privileged ports (usually 1004 and 1006):
cd /etc/hadoop/conf.rollback.datanode export HADOOP_SECURE_DN_USER=hdfs export JSVC_HOME=/opt/cloudera/parcels/<parcel_filename>/lib/bigtop-utils hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
- If the DataNode is not running on privileged
ports:
sudo hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
When the rolling back of the DataNodes is complete, terminate the console session by typing Control-C. Look for output from the command similar to the following that indicates when the DataNode rollback is complete:
Rollback of /dataroot/ycloud/dfs/dn/current/BP-<Block Group number> is complete
You may see the following error after issuing these commands:ERROR datanode.DataNode: Exception in secureMain java.io.IOException: The path component: '/var/run/hdfs-sockets' in '/var/run/hdfs-sockets/dn' has permissions 0755 uid 39998 and gid 1006. It is not protected because it is owned by a user who is not root and not the effective user: '0'.
The error message will also include the following command to run:
After running this command, the DataNode will restart successfully. Rerun the DataNode rollback command:chown root /var/run/hdfs-sockets
sudo hdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
- If the DataNode is running with privileged ports (usually 1004 and 1006):
- If High Availability for HDFS is enabled, restart the HDFS service. In the Cloudera Manager Admin Console, go to the HDFS service and select .
- If high availability is not enabled for HDFS, use the Cloudera Manager Admin Console to
restart all NameNodes and DataNodes.
- Go to the HDFS service.
- Select the Instances tab
- Select all DataNode and NameNode roles from the list.
- Click .
- (Clusters with TLS enabled only) Edit the
- If high availability is not enabled for HDFS, roll back the Secondary
NameNode.
- (Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml
file on all Secondary NameNode hosts (Located in the temporary rollback directory.) and update the keystore passwords with the actual cleartext passwords. The passwords will have values that look like this:<property> <name>ssl.server.keystore.password</name> <value>********</value> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>********</value> </property>
- (TLS only) Edit the
/etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml
file and remove thehadoop.security.credential.provider.path
property and change the value of property ssl.server.keystore.location to/etc/hadoop/conf.rollback.secondarynamenode/cm-auto-host_keystore.jks
- Log in to the Secondary NameNode host and run the following
commands:
rm -rf /dfs/snn/* cd /etc/hadoop/conf.rollback.secondarynamenode/ sudo -u hdfs hdfs --config /etc/hadoop/conf.rollback.secondarynamenode secondarynamenode -format
When the rolling back of the Secondary NameNode is complete, terminate the console session by typing Control-C. Look for output from the command similar to the following that indicates when the Secondary NameNode rollback is complete:
2020-12-21 17:09:36,239 INFO namenode.SecondaryNameNode: Web server init done
- (Clusters with TLS enabled only) Edit the
- Restart the HDFS service. Open the Cloudera Manager Admin Console, go to the HDFS service
page, and select
The Restart Command page displays the progress of the restart. Wait for the page to display the Successfully restarted service message before continuing.
.
For more information on HDFS, see HDFS troubleshooting documentation.
Start HBase
You might encounter other errors when starting HBase (for example, replication-related problems, region assignment related issues, and meta region assignment problems). In this case, you must delete the znode in ZooKeeper and then start HBase again. (This deletes the replication peer information and you need to re-configure your replication schedules)
- In Cloudera Manager, look up the value of the zookeeper.znode.parent
property. The default value is
/hbase
. - Connect to the ZooKeeper ensemble by running the following command from any HBase gateway
host.
zookeeper-client -server zookeeper_ensemble
To find the value to use for
zookeeper_ensemble
, open the /etc/hbase/conf.cloudera.<HBase service name>/hbase-site.xml file on any HBase gateway host. Use the value of the hbase.zookeeper.quorum property. - Specify the jaas.conf using the JVM flags by running the following
commands in the ZooKeeper client.
CLIENT_JVMFLAGS= "-Djava.security.auth.login.config=/var/run/cloudera-scm agent/process/HBase_process_directory/jaas.conf" zookeeper-client -server <zookeeper_ensemble>
The ZooKeeper command-line interface opens.
- Enter the following
command.
rmr /hbase
If you have deployed a secure cluster, enter the following command:
deleteall /hbase
If you see the message
Node not empty: /hbase/tokenauth
, you must re-run the same command and restart the HBase service. - Restart the HBase service.
After HBase is healthy, ensure that you restore the states of the Balancer and Normalizer (enable them if they were enabled before the rollback). Also re-enable the Merge and Split operations you disabled before the rollback to avoid the Master Procedure incompatibility problem.
Run the following commands in HBase Shell:
balance_switch true
normalizer_switch true
splitormerge_switch 'SPLIT', true
splitormerge_switch 'MERGE', true
Rollback Solr
- Start the HDFS, ZooKeeper and Ranger services.
- Restore the Solr-specific znodes in ZooKeeper using the snapshot you took before the upgrade.
- Start the Solr service.
-
Restore the Solr collections using the backups you created before starting the upgrade. For more information, see Restoring a Solr collection.
- Restart Lily HBase Indexer (ks_indexer).
Rollback Atlas
- Rollback Atlas Solr Collections
- Atlas has several collections in Solr that must be restored from the pre-upgrade backup - vertex_index, edge_index, and fulltext_index. These collections may already have been restored using the Rollback Solr documentation. If the collections are not yet restored, you must restore collections now using the Rollback Solr documentation.
- Rollback Atlas HBase Tables
-
- From a client host, start the HBase shell hbase shell
- Within the HBase shell, list the snapshots, that must contain the pre-upgrade snapshots list_snapshots
- Within the HBase shell, disable the atlas_janus table, restore the snapshot, and enable
the table
disable 'atlas_janus'
restore_snapshot '<name of atlas_janus snapshot from list_snapshots>'
enable 'atlas_janus'
- Within the HBase shell, disable the ATLAS_ENTITY_AUDIT_EVENTS
table, restore the snapshot, and enable the table
disable 'ATLAS_ENTITY_AUDIT_EVENTS'
restore_snapshot '<name of ATLAS_ENTITY_AUDIT_EVENTS snapshot from list_snapshots>'
enable 'ATLAS_ENTITY_AUDIT_EVENTS'
- Restart Atlas.
Rollback Kudu
Rollback depends on which backup method was used. There are two forms of backup/restore in Kudu. Spark job to create or restore a full/incremental backup or backup the entire Kudu node and restore it later. Restoring the Kudu data differs between the Spark job and full node backup approaches.
Rollback YARN Queue Manager
You can rollback YARN Queue Manager using the pre-upgrade backup of config-service.mv.db and config-service.trace.db.
- Navigate to the YARN Queue Manager service in Cloudera Manager and record the configuration value for config_service_db_loc (or queuemanager_user_home_dir if blank) and the host where the YARN Queue Manager Store is running.
- Stop the YARN Queue Manager service.
- SSH to the YARN Queue Manager Store host and copy the pre-upgrade config-service.mv.db and config-service.trace.db to the config_service_db_loc obtained in the previous step.
- Start the YARN Queue Manager service.
Rollback Kafka
You can rollback Kafka as long as the following criteria are met.
- The log format Kafka properties
inter.broker.protocol.version
andlog.message.format.version
are set before the upgrade and are not removed or cleared after the upgrade. - There is a backup with the pre-upgrade state of the cluster from where you can restore the data, such as a remote cluster replicated with SRM. For guidance on restoring data from an SRM-replicated cluster, contact Cloudera support.
Deploy the Client Configuration
- On the Cloudera Manager Home page, click the Actions menu and select Deploy Client Configuration.
- Click Deploy Client Configuration.
Restart the Cluster
You must restart the cluster using the following steps.
- On the Cloudera Manager Actions menu and select Restart. page, click the
- Click Restart that appears in the next screen to confirm. If
you have enabled high availability for HDFS,
you can choose Rolling
Restart instead to minimize cluster downtime.
The Command Details window shows the progress of stopping
services.
When All services successfully started appears, the task is complete and you can close the Command Details window.
Post rollback steps
- Streams Replication Manager (SRM)
- Reset the state of the internal Kafka Streams application. Run the following command on
the hosts of the SRM Service
role.
kafka-streams-application-reset \ --bootstrap-servers [***SRM SERVICE HOST***] \ --config-file [***PROPERTIES FILE***] \ --application-id srm-service_v2
Replace [***PROPERTIES FILE***] with the location of a configuration file that contains all necessary security properties that are required to establish a connection with the Kafka service. This option is only required if your Kafka service is secured.
- Cruise Control
-
- Insert the removed goal back to the relevant goal sets, but with the renamed goal name RackAwareGoal (Not RackAwareDistributionGoal)
- Restart Cruise Control
- Oozie
- Execute the Install Oozie ShareLib action through Cloudera
Manager:
- Go to the Oozie service.
- Select .
- Finalize the HDFS Upgrade
- This step should be performed only after all validation is completed. For more information, see Finalize the HDFS upgrade documentation.