Procedure to Rollback from Cloudera Base on premises 7.1.8 to
Cloudera Base on premises 7.1.7 SP1
You can roll back an upgrade from Cloudera Base on premises
7.1.8 to Cloudera Base on premises 7.1.7. The rollback restores your
Cloudera Base on premises cluster to the state it was in before
the upgrade, including Kerberos and TLS/SSL configurations.
Typically, you first upgrade Cloudera Manager then you use the upgraded version
of Cloudera Manager to upgrade Cloudera Base on premises 7.1.7 to
Cloudera Base on premises 7.1.8. (See Upgrading a Cluster.) If you want to roll back this
upgrade, follow these steps to roll back your cluster to its state prior to the upgrade.
You can roll back to CDH 6 after upgrading to Cloudera Base on premises 7 only if the HDFS upgrade has not been finalized. The
rollback restores your CDH cluster to the state it was in before the upgrade, including
Kerberos and TLS/SSL configurations.
Review Limitations🔗
The rollback procedure has the following limitations:
HDFS – If you have finalized the HDFS upgrade, you cannot roll back your
cluster.
Compute clusters – Rollback for Compute clusters is not currently
supported.
Configuration changes, including the addition of new services or roles after the
upgrade, are not retained after rolling back Cloudera Manager.
HBase – If your cluster is configured to use HBase replication, data written to
HBase after the upgrade might not be replicated to peers when you start your rollback.
This topic does not describe how to determine which, if any, peers have the replicated
data and how to roll back that data. For more information about HBase replication, see
HBase Replication.
Sqoop 1 – Because of the changes introduced in Sqoop metastore logic, the
metastore database that is created by the CDH 6.x version of Sqoop cannot be used by
earlier versions.
Sqoop 2 – As described in the upgrade process, Sqoop2 had to be
stopped and deleted before the upgrade process and therefore will not be available after
the rollback.
Kafka – Once the Kafka log format and protocol version configurations
(the inter.broker.protocol.version and log.message.format.version properties) are set to
the new version (or left blank, which means to use the latest version), Kafka
rollback is not possible.
Stop the Cluster🔗
On the Home> Status tab, click the Actions menu and select
Stop.
Click Stop in the confirmation screen. The Command Details
window shows the progress of the stopping process.
When the All services
successfully stopped message appears, the task is complete and you can close the
Command Details window.
Go to the YARN service and click Actions> Clean NodeManager
Recovery Directory. The CDH 6 NodeManager does not start after the downgrade if it
finds Cloudera Base on premises 7.x data in the recovery
directory. The format and content of the NodeManager's recovery state store was changed
between CDH 6.x and Cloudera Base on premises 7.x. The
recovery directory used by Cloudera Base on premises 7.x
must be cleaned up as part of the downgrade to CDH 6.
(Parcels) Downgrade the Software🔗
Follow these steps only if your cluster was upgraded using Clouderaparcels.
Log in to the Cloudera Manager Admin Console.
. Select Hosts> Parcels.
A list of parcels displays.
Locate the Cloudera Base on premises 7.1.7 parcel
and click Upgrade. The upgrade option activates the parcel and
restarts the services. At this point the HBase service restart fails. However, you must
continue with the rollback steps for all the services (after performing the Cloudera Manager restore steps if needed). After the services are in
running state after successfully performing their rollback, you can continue by clicking
the Resume option of the failed Upgrade action. (From the left Navigation pane, click on
Running Commands > All recent commands >
Upgrade (failed upgrade option , you need to adjust the
relevant Time Range))
After Upgrade step if the cluster is in running state, Stop the cluster.
Restore the Cloudera Manager databases from the backup of Cloudera Manager that was taken before upgrading the cluster to Cloudera Base on premises 7.1.7. See the procedures provided by
your database vendor.
Use the backup of CDH that was taken before the upgrade to restore
Cloudera Manager Server files and directories. Substitute the path to
your backup directory for cm7_cdh6
in the following steps:
On the host where the Event Server role is configured to run, restore the Events Server
directory from the Cloudera Manager 7/CDH 6 backup.
This
command may return a message similar to: rm: cannot remove
‘/var/run/cloudera-scm-agent/process’: Device or resource
busy. You can ignore this message.
On the host where the Service Monitor is running, restore the
Service Monitor directory:
At this point, roll-backing Cloudera Manager is not required and is completely optional.
But, if you want to rollback Cloudera Manager as well, follow steps as discussed in (Optional) Cloudera Manager Rollback Steps prior to going to the next step which is Start Cloudera Manager.
Start Cloudera Manager🔗
Log in to the Cloudera Manager server host.
Start the
Server.
sudo systemctl start cloudera-scm-server
Start the Cloudera Manager Agent.
Run the following
commands on all cluster
hosts:
sudo systemctl start cloudera-scm-agent
Start the Cloudera Management Service. Show Me How
Log in to the Cloudera Manager Admin Console.
Select Clusters > Cloudera Management Service.
Select Actions > Start.
The cluster page may indicate that services are in bad health. This is normal.
Stop the cluster. In the Cloudera Manager Admin Console, click the
Actions menu for the cluster and select
Stop.
Roll Back ZooKeeper🔗
Using the backup of Zookeeper that you created when backing up your CDH 6.x cluster,
restore the contents of the dataDir on each ZooKeeper
server. These files are located in a directory specified with the dataDir
property in the ZooKeeper configuration. The default location is
/var/lib/zookeeper. For
example:
Make sure that the permissions of all the directories and files are
as they were before the upgrade.
Start ZooKeeper using Cloudera Manager.
Roll Back HDFS🔗
You cannot roll back HDFS while high availability is enabled. The
rollback procedure in this topic creates a temporary configuration
without high availability. Regardless of whether high availability is
enabled, follow the steps in this section.
Roll back all of the Journal Nodes. (Only required for clusters where high
availability is enabled for HDFS). Use the JournalNode backup you
created when you backed up HDFS before upgrading to Cloudera Base on premises.
Log in to each Journal Node host and run the following
commands:
Roll back all of the NameNodes. Use the NameNode backup directory you
created before upgrading to Cloudera Base on premises.
(/etc/hadoop/conf.rollback.namenode) to perform the following steps on
all NameNode hosts:
(Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.namenode/ssl-server.xml file on all
NameNode hosts (located in the temporary rollback directory) and update the keystore
passwords with the actual cleartext passwords. The passwords will have values that
look like this:
(TLS only) Edit the
/etc/hadoop/conf.rollback.namenode/ssl-server.xml file and remove
the hadoop.security.credential.provider.path
property.
ssl.server.keystore.location to /etc/hadoop/conf.rollback.namenode/cm-auto-host_keystore.jks
Edit the /etc/hadoop/conf.rollback.namenode/core-site.xml and
change the value of the net.topology.script.file.name property to
/etc/hadoop/conf.rollback.namenode. For example:
# Original property
<property>
<name>net.topology.script.file.name</name>
<value>/var/run/cloudera-scm-agent/process/63-hdfs-NAMENODE/topology.py</value>
</property>
# New property
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf.rollback.namenode/topology.py</value>
</property>
Edit the /etc/hadoop/conf.rollback.namenode/topology.py file and
change the value of DATA_FILE_NAME to
/etc/hadoop/conf.rollback.namenode. For example:
Restart the NameNodes and JournalNodes using Cloudera Manager:
Go to the HDFS service.
Select the Instances tab, and then select all Failover
Controller, NameNode, and JournalNode roles from the list.
Click Actions for Selected > Restart.
Rollback the DataNodes.
Use the DataNode rollback directory
you created before upgrading to Cloudera Base on premises
(/etc/hadoop/conf.rollback.datanode) to perform the following steps
on all DataNode hosts:
(Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.datanode/ssl-server.xml file on all
DataNode hosts (Located in the temporary rollback directory.) and update the
keystore passwords (ssl.server.keystore.password and
ssl.server.keystore.keypassword) with the actual passwords.
The passwords will have values that look like
this:
(TLS only) Edit the
/etc/hadoop/conf.rollback.datanode/ssl-server.xml file and remove
the hadoop.security.credential.provider.path property and change
the value of
property.
ssl.server.keystore.location to /etc/hadoop/conf.rollback.datanode/cm-auto-host_keystore.jks
Edit the /etc/hadoop/conf.rollback.datanode/hdfs-site.xml file
and remove the dfs.datanode.max.locked.memory property.
If you kerberos enabled cluster then make sure change the value of
hdfs.keytab to the absolute path of conf.rollback.datanode folder in
core-site.xmland hdfs-site.xml
Run one of the following commands:
If the DataNode is running with privileged ports (usually 1004 and 1006):
cd /etc/hadoop/conf.rollback.datanodeexport HADOOP_SECURE_DN_USER=hdfsexport JSVC_HOME=/opt/cloudera/parcels/<parcel_filename>/lib/bigtop-utilshdfs --config /etc/hadoop/conf.rollback.datanode datanode -rollback
If the DataNode is not running on privileged
ports:
When the rolling back of the DataNodes is complete, terminate the console
session by typing Control-C. Look for output from the
command similar to the following that indicates when the DataNode rollback is
complete:
Rollback of /dataroot/ycloud/dfs/dn/current/BP-<Block Group number> is complete
You
may see the following error after issuing these
commands:
ERROR datanode.DataNode: Exception in secureMain java.io.IOException:
The path component: '/var/run/hdfs-sockets' in '/var/run/hdfs-sockets/dn' has permissions 0755 uid 39998 and gid 1006.
It is not protected because it is owned by a user who is not root and not the effective user: '0'.
The error message will also include the following command to
run:
chown root /var/run/hdfs-sockets
After
running this command, the DataNode will restart successfully. Rerun the DataNode
rollback command:
If High Availability for HDFS is enabled, restart the HDFS service. In the Cloudera Manager Admin Console, go to the HDFS service and select Actions > Restart.
If high availability is not enabled for HDFS, use the Cloudera Manager Admin Console to restart all NameNodes and
DataNodes.
Go to the HDFS service.
Select the Instances tab
Select all DataNode and NameNode roles from the list.
Click Actions for Selected > Restart.
If high availability is not enabled for HDFS, roll back the
Secondary NameNode.
(Clusters with TLS enabled only) Edit the
/etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml file on
all Secondary NameNode hosts (Located in the temporary rollback directory.) and update
the keystore passwords with the actual cleartext passwords. The passwords will have
values that look like this:
(TLS only) Edit the
/etc/hadoop/conf.rollback.secondarynamenode/ssl-server.xml file and
remove the hadoop.security.credential.provider.path property and
change the value of property ssl.server.keystore.location
to
When the rolling back of the Secondary
NameNode is complete, terminate the console session by typing
Control-C. Look for output from the command similar to the
following that indicates when the Secondary NameNode rollback is
complete:
2020-12-21 17:09:36,239 INFO namenode.SecondaryNameNode: Web server init done
Restart the HDFS service. Open the Cloudera Manager Admin Console,
go to the HDFS service page, and select
Actions > Restart.
The Restart Command page displays the
progress of the restart. Wait for the page to display the
Successfully restarted service message
before continuing.
Start the HBase Service🔗
Restart the HBase Service. Open the Cloudera Manager Admin Console, go
to the HBase service page, and select
Actions > Start.
If you have configured any HBase coprocessors, you must revert them to
the versions used before the upgrade.
If Cloudera Base on premises 7.x HBase Master was
started after the upgrade and there was any ongoing (or stuck) HBase Master Procedure
present in the HBase Master before stopping the Cloudera Base on premises 7 Cluster, it is expected for the Cloudera Base on premises 7.1.7 SP1 HBase Master to fail with
warnings and errors in the role log from the classes like 'ProcedureWALFormatReader' and
'WALProcedureStore' or 'TransitRegionStateProcedure'.
These errors mean that the HBase Master Write-Ahead Log files are incompatible
with the Cloudera Base on premises 7.1.7 SP1 HBase version.
The only way to fix this problem is to sideline the log files (all the files placed under
/hbase/MasterProcWALs by default), then restart the HBase Master. After the HBase Master
has started, Use the HBCK command to find out if there are any
inconsistencies that will need to be fixed manually.
You may encounter other errors when starting HBase (for example,
replication-related problems, region assignment related issues, and meta region assignment
problems). In this case you should delete the znode in ZooKeeper and then start HBase
again. (This will delete replication peer information and you will need to re-configure your
replication schedules.):
In Cloudera Manager, look up the value of the
zookeeper.znode.parent property. The default
value is /hbase.
Connect to the ZooKeeper ensemble by running the following command from any HBase
gateway host:
zookeeper-client -server zookeeper_ensemble
To
find the value to use for zookeeper_ensemble,
open the /etc/hbase/conf.cloudera.<HBase service
name>/hbase-site.xml file on any HBase gateway host. Use the
value of the hbase.zookeeper.quorum property.
The ZooKeeper command-line interface opens.
Enter the following
command:
rmr /hbase
Restart the HBase service.
After HBase is healthy, make sure you restore the states of the
Balancer and Normalizer (enable them if they were enabled before the
rollback). Also re-enable the Merge and Split operations you
disabled before the rollback to avoid the Master Procedure
incompatibility problem. Run the following commands in HBase Shell:
When you are rolling back from Cloudera Base on premises 7.1.8 to CDH 6 if you encounter a
change in the tableinfo file name format from the new tableinfo file name that was
created during the Cloudera Base on premises 7.1.8 upgrade
can prevent HBase from functioning normally.
After the rollback, if HDFS rollback was not successful and Hbase is unable
to read the tableinfo files then use the HBCK2 tool to verify the list
of tableinfo files that need to be fixed.
Follow these steps to execute the HBCK2 command on the HBCK2 tool to fix the
tableinfo file format:
Contact Cloudera support to request the
latest version of HBCK2 tool.
Use the following HBCK2 command and run the
HBCK2 tool without the –fix
option:
Check the output and verify whether all the tableinfo files are fixed.
Restore Databases🔗
Restore the following databases from their Cloudera Base on premises 7.1.7 backups:
Hive Metastore
Hue
Oozie
Ranger
Ranger KMS
Schema Registry
Streams Messaging Manager
The rollback must use databases restored from the appropriate backed-up database. The steps
for backing up and restoring databases differ depending on the database vendor and version
you select for your cluster and are beyond the scope of this document.
See the following vendor resources for more information:
The following steps are applicable only if you downgrade from Cloudera Base on premises 7.1.8 to Cloudera Base on premises 7.1.7.
In Cruise Control, you must rename
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
to com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal in
Cloudera Manager > Clusters >
Cruise Control > Configurations tab in
every occurrences as described below during rollback process.
In Cruise Control, from Cloudera Base on premises 7.1.8 and
higher, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal
is renamed to
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal.
Perform the below steps:
Check the following goal sets if RackAwareDistributionGoal is present
(Cloudera Manager > Clusters >
Cruise Control > Configurations tab):
default.goals
goals
self.healing.goals
hard.goals
anomaly.detection.goals
Create a note for yourself about where RackAwareDistributionGoal were
present
Remove RackAwareDistributionGoal from all of the goal lists
Perform the runtime rollback process
Insert the removed goal back to the relevant goal sets, but with the renamed goal name
RackAwareGoal (Not RackAwareDistributionGoal)
Restart Cruise Control
Deploy the Client Configuration🔗
On the Cloudera Manager Home page, click the Actions menu and select Deploy Client Configuration.
Click Deploy Client Configuration.
Restart the Cluster🔗
On the Cloudera ManagerHome page, click
the Actions menu and select
Restart.
Click Restart that appears in the next screen to confirm. If
you have enabled high availability for HDFS,
you can choose Rolling
Restart instead to minimize cluster downtime.
The Command Details window shows the progress of stopping
services.
When All services successfully started appears,
the task is complete and you can close the Command Details
window.
(Optional) Cloudera Manager Rollback Steps🔗
After you complete the rollback steps, your cluster is using Cloudera Manager 7 to manage your CDH 6 or CDH 6 cluster. You can continue
to use Cloudera Manager 7 to manage your CDH 6 cluster, or you can
downgrade to Cloudera Manager 6 by following these steps:
Back up the repository directory. You can create a top-level
backup directory and an environment variable to reference the
directory using the following commands. You can also substitute
another directory path in the backup commands
below:
Run the following commands on the Cloudera Manager server
host:
Operating System
Command
RHEL
sudo yum remove cloudera-manager-server
sudo yum install cloudera-manager-server
SLES
sudo zypper remove cloudera-manager-server
sudo zypper install cloudera-manager-server
Ubuntu or Debian
sudo apt-get purge cloudera-manager-server
sudo apt-get install cloudera-manager-server
Restore Cloudera Manager Databases🔗
Restore the Cloudera Manager databases from the backup of Cloudera Manager that was taken before upgrading to Cloudera Manager 7. See the procedures provided by your database vendor.
These databases include the following:
Cloudera Manager Server
Reports Manager
Navigator Audit Server
Navigator Metadata Server
Activity Monitor (Only used for MapReduce 1 monitoring).
Stopping Agents - To stop or restart Agents while leaving the managed processes running, use one of the following commands: sudo systemctl stop cloudera-scm-agent
Starting Agents - To start Agents, the supervisor process, and all managed service
processes, use the following command: sudo systemctl start
cloudera-scm-agent
This site uses cookies and related technologies, as described in our privacy policy, for purposes that may include site operation, analytics, enhanced user experience, or advertising. You may choose to consent to our use of these technologies, or