Upgrading the CDH Cluster

The version of CDH you can upgrade to depends on the version of Cloudera Manager that is managing the cluster. You may need to upgrade Cloudera Manager before upgrading CDH.

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

My Environment

Fill in the following form to create a customized set of instructions for your environment.

Current Cloudera Manager Version

Install Method

Operating System

HDFS High Availability

Current CDH Version

New CDH Version

Refreshing ContentFill out the form above before you proceed. Content Updated

To share this environment with others, click the icon next to My Environment to copy a link specific for this environment to the clipboard.

6.3.4 6.3.3 6.3.1 6.3.0 6.2.1 6.2.0 6.1.1 6.1.0 6.0.1 6.0.0 5.16 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 6.3.4 6.3.3 6.3.2 6.2.1 6.2.0 6.1.1 6.1.0 6.0.1 6.0.0 5.16 5.15 5.14 5.13 5.12 5.11 5.10 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0

Important: Upgrade and installation procedures for Cloudera Manager or CDH 6.x have changed. Please note the following:

A valid Cloudera Enterprise license file and a username and password are required to download and install the software. You can obtain the username and password from the Cloudera CDH Download page. See Your license file must be current and uploaded to Cloudera Manager.
To upload a license: Show Me How
1. Download the license file and save it locally.
2. In Cloudera Manager, go to the Home page.
3. Select Administration > License.
4. Click Upload License.
5. Browse to the license file you downloaded.
6. Click Upload.
If you are currently using a Cloudera trial license, you cannot upgrade to Cloudera Manager or CDH 6.3.3.
If you are using Cloudera Express, you cannot upgrade Cloudera Manager or CDH.
Several steps in the procedures have changed and now require the username and password.
Download URLs have changed.

Important:

This procedure is for CDH clusters only. Follow this procedure to upgrade a Key Trustee Server Cluster and this can be carried out independently.
The embedded PostgreSQL database is NOT supported in production environments.
If you use the Solr Search service in your cluster, there are significant manual steps you must follow both before and after upgrading CDH. See Migrating Cloudera Search Configuration Before Upgrading to CDH 6.
The following services are no longer supported as of Enterprise 6.0.0:
- Sqoop 2
- MapReduce 1
- Spark 1.6
- Record Service
Running Apache Accumulo on top of a CDH 6.0.0 cluster is not currently supported. If you try to upgrade to CDH 6.0.0 you will be asked to remove the Accumulo service from your cluster. Running Accumulo on top of CDH 6 will be supported in a future release.
The minor version of Cloudera Manager you use to perform the upgrade must be equal to or greater than the CDH minor version. To upgrade Cloudera Manager, see Upgrading Cloudera Manager. Show
- Supported:
  - Cloudera Manager 6.0.0 and CDH 5.14.0
  - Cloudera Manager 5.14.0 and CDH 5.13.0
  - Cloudera Manager 5.13.1 and CDH 5.13.3
- Not Supported:
  - Cloudera Manager 5.14.0 and CDH 6.0.0
  - Cloudera Manager 5.12 and CDH 5.13
  - Cloudera Manager 6.0.0 and CDH 5.6

Note:

When upgrading CDH using Rolling Restart (Minor Upgrade only):

Automatic failover does not affect the rolling restart operation.
After the upgrade has completed, do not remove the old parcels if there are MapReduce or Spark jobs currently running. These jobs still use the old parcels and must be restarted in order to use the newly upgraded parcel.
Ensure that Oozie jobs are idempotent.
Do not use Oozie Shell Actions to run Hadoop-related commands.
Rolling upgrade of Spark Streaming jobs is not supported. Restart the streaming job once the upgrade is complete, so that the newly deployed version starts being used.
Runtime libraries must be packaged as part of the Spark application.
You must use the distributed cache to propagate the job configuration files from the client gateway machines.
Do not build "uber" or "fat" JAR files that contain third-party dependencies or CDH classes as these can conflict with the classes that Yarn, Oozie, and other services automatically add to the CLASSPATH.
Build your Spark applications without bundling CDH JARs.

After you complete the steps to prepare your CDH upgrade and backup CDH components, continue with the following upgrade steps:

Back Up Cloudera Manager
Enter Maintenance Mode
Complete Pre-Upgrade Migration Steps
Establish Access to the Software
Run Hue Document Cleanup
Check Oracle Database Initialization
Stop the Cluster
Install CDH Packages
Download and Distribute Parcels
Run the Upgrade CDH Wizard
Remove the Previous CDH Version Packages and Refresh Symlinks
Complete the Cloudera Search Upgrade
Finalize the HDFS Upgrade
Finalize the HDFS Upgrade
For Sentry with an Oracle Database, Add the AUTHZ_PATH.AUTHZ_OBJ_ID Index
Complete Post-Upgrade Migration Steps
Exit Maintenance Mode

Back Up Cloudera Manager

Before you upgrade a CDH cluster, back up Cloudera Manager. Even if you just backed up Cloudera Manager before an upgrade, you should now back up your upgraded Cloudera Manager deployment. See Backing Up Cloudera Manager.

Enter Maintenance Mode

To avoid unnecessary alerts during the upgrade process, enter maintenance mode on your cluster before you start the upgrade. Show Me Why Entering maintenance mode stops email alerts and SNMP traps from being sent, but does not stop checks and configuration validations. Be sure to exit maintenance mode when you have finished the upgrade to re-enable Cloudera Manager alerts. More Information.

Show Me How

On the Home > Status tab, click next to the cluster name and select Enter Maintenance Mode.

Complete Pre-Upgrade Migration Steps

Complete the following steps when upgrading from CDH 5.x to CDH 6.x.

YARN
Decommission and recommission the YARN NodeManagers but do not start the NodeManagers.

A decommission is required so that the NodeManagers stop accepting new containers, kill any running containers, and then shutdown.
Show Me How
1. Ensure that new applications, such as MapReduce or Spark applications, will not be submitted to the cluster until the upgrade is complete.
2. In the Cloudera Manager Admin Console, navigate to the YARN service for the cluster you are upgrading.
3. On the Instances tab, select all the NodeManager roles. This can be done by filtering for the roles under Role Type.
4. Click Actions for Selected (number) > Decommission.
  If the cluster runs CDH 5.9 or higher and is managed by Cloudera Manager 5.9 or higher, and you configured graceful decommission, the countdown for the timeout starts.
  
  A Graceful Decommission provides a timeout before starting the decommission process. The timeout creates a window of time to drain already running workloads from the system and allow them to run to completion. Search for the Node Manager Graceful Decommission Timeout field on the Configuration tab for the YARN service, and set the property to a value greater than 0 to create a timeout.
5. Wait for the decommissioning to complete. The NodeManager State is Stopped and the Commission State is Decommissioned when decommissioning completes for each NodeManager.
6. With all the NodeManagers still selected, click Actions for Selected (number) > Recommission.
  Important: Do not start the NodeManagers.
Hive
There are changes to query syntax, DDL syntax, and the Hive API. You might need to edit the HiveQL code in your application workloads before upgrading.

See Incompatible Changes for Apache Hive/Hive on Spark/HCatalog.
Pig
DataFu is no longer supported. Your Pig scripts will require modification for use with CDH 6.x.

See Incompatible Changes for Apache Pig.
Sentry
If your cluster uses Sentry policy file authorization, you must migrate the policy files to the database-backed Sentry service before you upgrade to CDH 6.
See Migrating from Sentry Policy Files to the Sentry Service.
Cloudera Search
If your cluster uses Cloudera Search, you must migrate the configuration to Apache Solr 7.

See Migrating Cloudera Search Configuration Before Upgrading to CDH 6.
Spark
If your cluster uses Spark or Spark Standalone, there are several steps you must perform to ensure that the correct version is installed.

See Migrating Apache Spark Before Upgrading to CDH 6.
Kafka
In CDH 5.x, Kafka was delivered as a separate parcel and could be installed along with CDH 5.x using Cloudera Manager. Starting with CDH 6.0, Kafka is part of the CDH distribution and is deployed as part of the CDH 6.x parcel. Show Me How
1. Explicitly set the Kafka protocol version to match what's being used currently among the brokers and clients. Update server.properties on all brokers as follows:
  1. Log in to the Cloudera Manager Admin Console
  2. Choose the Kafka service.
  3. Click Configuration.
  4. Use the Search field to find the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties configuration property.
  5. Add the following properties to the snippet:
    - inter.broker.protocol.version = current_Kafka_version
    - log.message.format.version = current_Kafka_version
    Replace current_Kafka_version with the version of Apache Kafka currently being used. See the Product Compatibility Matrix for CDK Powered By Apache Kafka to find out which upstream version is used by which version of CDK. Make sure you enter full Apache Kafka version numbers with three values, such as 0.10.0. Otherwise, you will see an error message similar to the following:
```
2018-06-14 14:25:47,818 FATAL kafka.Kafka$:
java.lang.IllegalArgumentException: Version `0.10` is not a valid version
        at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
        at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
        at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
```
2. Save your changes. The information is automatically copied to each broker.
HBase
See Migrating Apache HBase Before Upgrading to CDH 6.
Hue
See Installing Dependencies for Hue.
Key Trustee KMS
See Pre-Upgrade Migration Steps for Upgrading Key Trustee KMS to CDH 6.
HSM KMS
See Pre-Upgrade Migration Steps for Upgrading HSM KMS to CDH 6.

Establish Access to the Software

When you upgrade CDH using packages, you can choose to access the Cloudera public repositories directly, or you can download those repositories and set up a local repository to access them from within your network. If your cluster hosts do not have connectivity to the Internet, you must set up a local repository.

Run the following commands on all cluster hosts to backup the repository directories and remove older files:

RHEL / CentOS

sudo cp -rpf /etc/yum.repos.d $HOME/yum.repos.d-`date +%F`-CM-CDH

sudo rm /etc/yum.repos.d/cloudera*cdh.repo*

SLES

sudo cp -rpf /etc/zypp/repos.d $HOME/repos.d-`date +%F`CM-CDH

sudo rm /etc/zypp/repos.d/cloudera*cdh.repo*

Debian / Ubuntu

sudo cp -rpf /etc/apt/sources.list.d $HOME/sources.list.d-`date +%F`-CM-CDH

sudo rm /etc/apt/sources.list.d/cloudera*cdh.list*

On all cluster hosts, do one of the following, depending on whether or not you are using a local package repository:
- Using a local package repository. (Required when cluster hosts do not have access to the internet.)
  1. Configure a local package repository hosted on your network.
  2. In the Package Repository URL field below, replace the entire URL with the URL for your local package repository. A username and password are not required to access local repositories.
  3. Click Apply.
- Using the Cloudera public repository
  1. In the Package Repository URL field below, substitute your USERNAME and PASSWORD where indicated in the URL.
  2. Click Apply
  Package Repository URL:
  
  Tip: If you have a mixed operating system environment, adjust the Operating System filter at the top of the page for each operating system. The guide will generate the repo file for you automatically here.

Create a file named /etc/yum.repos.d/cloudera-cdh.repo with the following content:Create a file named /etc/zypp/repos.d/cloudera-cdh.repo with the following content:Debian is not supported for CDH 6.x. Please select a supported operating system to continue.Create a file named /etc/apt/sources.list.d/cloudera-cdh.list with the following content:

[cloudera-cdh]
# Packages for Cloudera CDH
name=Cloudera CDH
baseurl=https://archive.cloudera.com/cdh6/CDH version/operating systemOS version/yum/ 
gpgkey=https://archive.cloudera.com/cdh6/CDH version/operating systemOS version/yum/RPM-GPG-KEY-cloudera
gpgcheck=1

# Packages for Cloudera CDH
deb https://archive.cloudera.com/cdh6/CDH version/ubuntuOS version/apt/ bionic-cdhCDH version contrib
deb-src https://archive.cloudera.com/cdh6/CDH version/ubuntuOS version/apt/ bionic-cdhCDH version contrib

RHEL / CentOS

Create a file named /etc/yum.repos.d/cloudera-cdh.repo with the following content:

[cloudera-cdh]
# Packages for Cloudera CDH
name=Cloudera CDH
baseurl=https://archive.cloudera.com/cdh6/6.3.3/redhat7/yum/
gpgkey=https://archive.cloudera.com/cdh6/6.3.3/redhat7/yum/RPM-GPG-KEY-cloudera
gpgcheck=1

SLES

Create a file named /etc/zypp/repos.d/cloudera-cdh.repo with the following content:

[cloudera-cdh]
# Packages for Cloudera CDH
name=Cloudera CDH
baseurl=https://archive.cloudera.com/cdh6/6.3.3/sles12/yum/
gpgkey=https://archive.cloudera.com/cdh6/6.3.3/sles12/yum/RPM-GPG-KEY-cloudera
gpgcheck=1

Debian / Ubuntu

Create a file named /etc/apt/sources.list.d/cloudera-cdh.list with the following content:

# Packages for Cloudera CDH
deb https://archive.cloudera.com/cdh6/6.3.3/debian8/apt/ jessie-cdh6.3.3 contrib
deb-src https://archive.cloudera.com/cdh6/6.3.3/debian8/apt/ jessie-cdh6.3.3 contrib

Make the following changes to the repository file:
1. Add /p after https://archive.cloudera.com.
2. Prepend your username and password to the referenced URLs.
For example:
```
https://username:password@archive.cloudera.com/p/...
```
Copy the file to all cluster hosts.

Run Hue Document Cleanup

If your cluster uses Hue, perform the following steps (not required for maintenance releases). These steps clean up the database tables used by Hue and can help improve performance after an upgrade. Show Me How

Backup the Hue database.
Connect to the Hue database. See Hue Custom Databases in the Hue component guide for information about connecting to your Hue database.

Check the size of the desktop_document, desktop_document2, oozie_job, beeswax_session, beeswax_savedquery and beeswax_queryhistory tables to have a reference point. If any of these have more than 100k rows, run the cleanup.

select 'desktop_document' as table_name, count(*) from desktop_document
union
select 'desktop_document2' as table_name, count(*) from desktop_document2
union
select 'beeswax_session' as table_name, count(*) from beeswax_session
union
select 'beeswax_savedquery' as table_name, count(*) from beeswax_savedquery
union
select 'beeswax_queryhistory' as table_name, count(*) from beeswax_queryhistory
union
select 'oozie_job' as table_name, count(*) from oozie_job
order by 1;

Pick a node with a running Hue instance, the script requires Hue to be running and uses the running configuration of Hue to run. Download the hue_scripts repo to any Hue node by following the git or wget steps. These scripts are a set of libraries and commands, the ENTIRE repo is required.
```
wget -qO- -O /tmp/hue_scripts.zip https://github.com/cmconner156/hue_scripts/archive/master.zip && unzip -d /tmp /tmp/hue_scripts.zip
mv /tmp/hue_scripts-master /opt/cloudera/hue_scripts
```
Change the permissions on the script runner to make it runnable:
```
chmod 700 /opt/cloudera/hue_scripts/script_runner
```
Run the script as root on that node, the command is all one line, DESKTOP_DEBUG=True is setup for the environment of the single run of this command so you don't have to tail a log:
```
DESKTOP_DEBUG=True /opt/cloudera/hue_scripts/script_runner hue_desktop_document_cleanup --keep-days 90
```
If you included the DESKTOP_DEBUG above, logging will be in the console. Otherwise check /var/log/hue/hue_desktop_document_cleanup.log.
- Note: The first run typically takes around 1 minute per 1000 entries in each table(but can take much longer depending on the size of the tables.

Check the size of the desktop_document, desktop_document2, oozie_job, beeswax_session, beeswax_savedquery and beeswax_queryhistory tables and confirm they are now smaller.

select count(*) from desktop_document;
select count(*) from desktop_document2;
select count(*) from beeswax_session;
select count(*) from beeswax_savedquery;
select count(*) from beeswax_queryhistory;
select count(*) from oozie_job;

If any of the tables are still above 100k in size, run the command again, keeping less days:
```
--keep-days 60 or --keep-days 30
```

Check Oracle Database Initialization

If your cluster uses Oracle for any databases, before upgrading from CDH 5 to CDH 6, check the value of the COMPATIBLE initialization parameter in the Oracle Database using the following SQL query:

SELECT name, value FROM v$parameter WHERE name = 'compatible'

The default value is 12.2.0. If the parameter has a different value, you can set it to the default as shown in the Oracle Database Upgrade Guide.

Stop the Cluster

Stop the cluster before proceeding to upgrade CDH using packages:

Open the Cloudera Manager Admin Console.
Click the drop-down list next to the cluster name and select Stop.

Install CDH Packages

Run the following command:

RHEL / CentOS

sudo yum clean all

sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper

sudo yum clean all

sudo yum remove hadoop-0.20\* hue-\* crunch llama mahout sqoop2 whirr sqoop2-client

sudo yum install avro-tools bigtop-jsvc bigtop-utils flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite keytrustee-keyprovider kudu oozie parquet parquet-format pig search sentry sentry-hdfs-plugin solr solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper

SLES

sudo zypper clean --all

sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper

sudo zypper clean --all

sudo zypper remove hadoop-0.20\* hue-\* crunch llama mahout sqoop2 whirr sqoop2-client

sudo zypper install avro-tools bigtop-jsvc bigtop-utils flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite keytrustee-keyprovider kudu oozie parquet parquet-format pig search sentry sentry-hdfs-plugin solr solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper

Debian / Ubuntu

sudo apt-get update
sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie parquet pig pig-udf-datafu search sentry solr solr-mapreduce spark-python sqoop sqoop2 whirr zookeeper

sudo apt-get update
sudo apt-get remove hadoop-0.20\* crunch llama mahout sqoop2 whirr sqoop2-client

sudo apt-get update
sudo apt-get install avro-tools bigtop-jsvc bigtop-utils flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hadoop-kms hbase hbase-solr hive-hbase hive-webhcat hue impala impala-shell kafka kite keytrustee-keyprovider kudu oozie parquet parquet-format pig search sentry sentry-hdfs-plugin solr solr-crunch solr-mapreduce spark-core spark-python sqoop zookeeper

Restart the Cloudera Manager Agent.
RHEL 7, SLES 12, Debian 8, Ubuntu 16.04 and higher
```
sudo systemctl restart cloudera-scm-agent
```
If the agent starts without errors, no response displays.
RHEL 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04 or 14.04
```
sudo service cloudera-scm-agent restart
```
You should see the following:
```
Starting cloudera-scm-agent: [ OK ]
```

Download and Distribute Parcels

Log in to the Cloudera Manager Admin Console.
Click Hosts > Parcels. The Parcels page displays.
Update the Parcel Repository for CDH using the following remote parcel repository URL:
```
https://username:password@archive.cloudera.com/p/cdh5/parcels/5.15/
```
```
https://username:password@archive.cloudera.com/p/cdh6/version/parcels/
```
Show Me How
1. Click the Configuration button.
2. In the Remote Parcel Repository URLs section, Click the + icon to add the parcel URL above. Click Save Changes. See Parcel Configuration Settings for more information.
3. Locate the row in the table that contains the new CDH parcel and click the Download button. If the parcel does not appear on the Parcels page, ensure that the Parcel URL you entered is correct.
4. After the parcel is downloaded, click the Distribute button.
If your cluster has GPLEXTRAS installed, update the version of the GPLEXTRAS parcel to match the CDH version using the following remote parcel repository URL:
```
https://archive.cloudera.com/gplextras5/parcels/5.15/
```
```
https://archive.cloudera.com/p/gplextras6/version/parcels/
```
Note:
If you are using Cloudera Manager 6.3.3 , your username and password are not required in the Remote Parcel Repository URL.
If you are using Cloudera Manager 6.3.0 or 6.3.1, add your username and password to the Remote Parcel Repository URL. For example:
```
https://username:password@archive.cloudera.com/p/gplextras6/version/parcels/
```
Show Me How
1. Click the Configuration button.
2. In the Remote Parcel Repository URLs section, Click the + icon to add the parcel URL above. Click Save Changes. See Parcel Configuration Settings for more information.
3. Locate the row in the table that contains the new CDH parcel and click the Download button. If the parcel does not appear on the Parcels page, ensure that the Parcel URL you entered is correct.
4. After the parcel is downloaded, click the Distribute button.
If your cluster has Spark 2.0, Spark 2.1, or Spark 2.2 installed, and you want to upgrade to CDH 5.13 or higher, you must download and install Spark 2.1 release 2, Spark 2.2 release 2, or a higher version.
Show Me How
To install these versions of Spark, do the following before running the CDH Upgrade Wizard:
1. Install the Custom Service Descriptor (CSD) file. See
  - Installing Spark 2.1 OR
  - Installing Spark 2.2
    Note:
    Spark 2.2 requires that JDK 1.8 be deployed throughout the cluster. JDK 1.7 is not supported for Spark 2.2.
    
    See Upgrading the JDK.
2. Download, distribute, and activate the Parcel for the version of Spark that you are installing:
  - Spark 2.1 release 2: The parcel name includes "cloudera2" in its name.
  - Spark 2.2 release 2: The parcel name includes "cloudera2" in its name.
  See Managing Parcels.
If your cluster has Kudu 1.4.0 or lower installed and you want to upgrade to CDH 5.13 or higher, deactivate the existing Kudu parcel. Show Me Why Starting with Kudu 1.5.0 / CDH 5.13, Kudu is part of the CDH parcel and does not need to be installed separately.
After all the parcels are distributed, click on the Upgrade button next to the chosen CDH. The chosen CDH should be selected automatically.

Run the Upgrade CDH Wizard

If you are using packages, or did not choose Upgrade from the parcels page, You can get to the Upgrade CDH page from the Home > Status tab, click next to the cluster name and select Upgrade Cluster.
Select the previously download/distributed CDH version. If no qualifying CDH parcels are pre-listed, or you want to upgrade to a different version of CDH: Show Me How
1. Click the Remote Parcel Repository URLs link and add the appropriate parcel URL. See Parcel Configuration Settings for more information.
2. Click the Cloudera Manager logo to return to the Home page.
3. From the Home > Status tab, click next to the cluster name and select Upgrade Cluster.
If you were previously using packages and would like to switch to using parcels, select Use Parcels.
Cloudera Manager 5.14 and lower: Show Me How
1. In the Choose CDH Version (Parcels) section, select the CDH version that you want to upgrade to.
2. Click Continue.
  A page displays the version you are upgrading to and asks you to confirm that you have completed some additional steps.
3. Click Yes, I have performed these steps.
4. Click Continue.
5. Cloudera Manager verifies that the agents are responsive and that the correct software is installed. When you see the No Errors Found message, click Continue.
  The selected parcels are downloaded, distributed, and unpacked.
6. Click Continue.
  The Host Inspector runs. Examine the output and correct any reported errors.
Cloudera Manager 5.15 and higher: Show Me How
1. In the Upgrade to CDH Version drop-down list, select the version of CDH you want to upgrade to.
  The Upgrade Wizard performs some checks on configurations, health, and compatibility and reports the results. Fix any reported issues before continuing.
2. Click Run Host Inspector.
  The Host Inspector runs. Click Show Inspector Results to view the Host Inspector report (opens in a new browser tab). Fix any reported issues before continuing.
3. Click Run Service Inspector. Click Show Inspector Results to view the output of the Service Inspector command (opens in a new browser tab). Fix any reported issues before continuing.
4. Read the notices for steps you must complete before upgrading, select Yes, I have performed theses steps. ... after completing the steps, and click Continue.
  The selected parcels are downloaded, distributed, and unpacked. The Continue button turns blue when this process finishes.
If you have a parcel that works with the existing CDH version, the Upgrade Wizard may display a message that this parcel conflicts with the new CDH version.
1. Configure and download the newer version of this parcel before proceeding. Show Me How
  1. Open the Cloudera Manager Admin Console from another browser tab, go to the parcels page, and configure the remote parcel repository for the newer version of this parcel.
  2. Download and distribute the newer version of this parcel.
2. Click the Run All Checks Again button.
3. Select the option to resolve the conflicts automatically.
4. Cloudera Manager deactivates the old version of the parcel, activates the new version and verifies that all hosts have the correct software installed.
Click Continue.
The Choose Upgrade Procedure screen displays. Select the upgrade procedure from the following options:
- Rolling Restart
  Note: This option is only available if you have enabled high availability for HDFS, the cluster uses an Enterprise license, and you are performing a minor or maintenance upgrade.
  
  Cloudera Manager upgrades services and performs a rolling restart. The Rolling Restart dialog box displays the impact of the restart on various services. Services that do not support rolling restart undergo a normal restart, and are not available during the restart process.
  
  Configure the following parameters for the rolling restart (optional):
  Roles to include
  Select which roles to restart as part of the rolling restart.
  Batch Size
  Number of roles to include in a batch. Cloudera Manager restarts the worker roles rack-by-rack, in alphabetical order, and within each rack, hosts are restarted in alphabetical order. If you use the default replication factor of 3, Hadoop tries to keep the replicas on at least 2 different racks. So if you have multiple racks, you can use a higher batch size than the default 1. However, using a batch size that is too high means that fewer worker roles are active at any time during the upgrade, which can cause temporary performance degradation. If you are using a single rack, restart one worker node at a time to ensure data availability during upgrade.
  Advanced Options > Sleep between batches
  Amount of time Cloudera Manager waits before starting the next batch. Applies only to services with worker roles.
  Advanced Options > Failed threshold
  The number of batch failures that cause the entire rolling restart to fail. For example, if you have a very large cluster, you can use this option to allow some failures when you are sure that the cluster will still be functional while some worker roles are down.
  
  Click the Rolling Restart button when you are ready to restart the cluster.
- Full Cluster Restart
  Cloudera Manager performs all service upgrades and restarts the cluster.
- Manual Upgrade
  Cloudera Manager configures the cluster to the specified CDH version but performs no upgrades or service restarts. Manually upgrading is difficult and for advanced users only. Manual upgrades allow you to selectively stop and restart services to prevent or mitigate downtime for services or clusters where rolling restarts are not available.
  
  To perform a manual upgrade: See Upgrading CDH Manually after an Upgrade Failure for the required steps.
Click Continue.
The Upgrade Cluster Command screen displays the result of the commands run by the wizard as it shuts down all services, activates the new parcels, upgrades services, deploys client configuration files, and restarts services and performs a rolling restart of the services that support it.

If any of the steps fail, correct any reported errors and click the Resume button. Cloudera Manager will skip restarting roles that have already successfully restarted. Alternately, return to the Home > Status tab and then perform the steps in Upgrading CDH Manually after an Upgrade Failure.

Note: If Cloudera Manager detects a failure while upgrading CDH, Cloudera Manager displays a dialog box where you can create a diagnostic bundle to send to Cloudera Support so they can help you recover from the failure. The cluster name and time duration fields are pre-populated to capture the correct data.
Click Continue.
If your cluster was previously installed or upgraded using packages, the wizard may indicate that some services cannot start because their parcels are not available. To download the required parcels:
1. In another browser tab, open the Cloudera Manager Admin Console.
2. Select Hosts > Parcels.
3. Locate the row containing the missing parcel and click the button to Download, Distribute, and then Activate the parcel.
4. Return to the upgrade wizard and click the Resume button.
  The Upgrade Wizard continues upgrading the cluster.
Click Finish to return to the Home page.

Remove the Previous CDH Version Packages and Refresh Symlinks

[Not required for CDH maintenance release upgrades.]

If your previous installation of CDH was done using packages, remove those packages on all hosts where you installed the parcels and refresh the symlinks so that clients will run the new software versions.

Skip this step if your previous installation or upgrade used parcels.

Show Me How

If your Hue service uses the embedded SQLite database, back up /var/lib/hue/desktop.db to a location that is not /var/lib/hue because this directory is removed when the packages are removed.

Uninstall the CDH packages on each host:

Not including Impala and Search

RHEL / CentOS

sudo yum remove bigtop-utils bigtop-jsvc bigtop-tomcat 'hue-*' sqoop2-client

SLES

sudo zypper remove bigtop-utils bigtop-jsvc bigtop-tomcat 'hue-*' sqoop2-client

Debian / Ubuntu

sudo apt-get purge bigtop-utils bigtop-jsvc bigtop-tomcat 'hue-*' sqoop2-client

Including Impala and Search

RHEL / CentOS

sudo yum remove 'bigtop-*' 'hue-*' impala-shell solr-server sqoop2-client hbase-solr-doc avro-libs crunch-doc avro-doc solr-doc

SLES

sudo zypper remove 'bigtop-*' 'hue-*' impala-shell solr-server sqoop2-client hbase-solr-doc avro-libs crunch-doc avro-doc solr-doc

Debian / Ubuntu

sudo apt-get purge 'bigtop-*' 'hue-*' impala-shell solr-server sqoop2-client hbase-solr-doc avro-libs crunch-doc avro-doc solr-doc

Restart all the Cloudera Manager Agents to force an update of the symlinks to point to the newly installed components on each host.
If your Hue service uses the embedded SQLite database, restore the database you backed up:
1. Stop the Hue service.
2. Copy the backup from the temporary location to the newly created Hue database directory, /var/lib/hue.
3. Start the Hue service.

Complete the Cloudera Search Upgrade

If the cluster you are upgrading has Cloudera Search (Solr) enabled, perform the following steps:

Log in to the Cloudera Manager Admin Console.
Go to the Solr service page.
Stop the Solr service and dependent services. Click Actions > Stop.
Click Actions > Reinitialize Solr State for Upgrade.
Click Actions > Bootstrap Solr Configuration.
Start the Solr and dependent services. Click Actions > Start.
Click Actions > Bootstrap Solr Collections.

Finalize the HDFS Upgrade

Follow the steps in this section if you are upgrading:

CDH 5.0 or 5.1 to 5.2 or higher
CDH 5.2 or 5.3 to 5.4 or higher

To determine if you can finalize the upgrade, run important workloads and ensure that they are successful. After you have finalized the upgrade, you cannot roll back to a previous version of HDFS without using backups. Verifying that you are ready to finalize the upgrade can take a long time.

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:

Deleting files does not free up disk space.
Using the balancer causes all moved replicas to be duplicated.
All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.

If you have Enabled high availability for HDFS, and you have performed a rolling upgrade:

Go to the HDFS service.
Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

If you have not performed a rolling upgrade:

Go to the HDFS service.
Click the Instances tab.
Click the link for the NameNode instance. If you have enabled high availability for HDFS, click the link labeled NameNode (Active).
The NameNode instance page displays.
Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

Finalize the HDFS Upgrade

Make sure you have enough free disk space, keeping in mind that the following behavior continues until the upgrade is finalized:

Deleting files does not free up disk space.
Using the balancer causes all moved replicas to be duplicated.
All on-disk data representing the NameNodes metadata is retained, which could more than double the amount of space required on the NameNode and JournalNode disks.

If you have Enabled high availability for HDFS, and you have performed a rolling upgrade:

Go to the HDFS service.
Select Actions > Finalize Rolling Upgrade and click Finalize Rolling Upgrade to confirm.

If you have not performed a rolling upgrade:

Go to the HDFS service.
Click the Instances tab.
Click the link for the NameNode instance. If you have enabled high availability for HDFS, click the link labeled NameNode (Active).
The NameNode instance page displays.
Select Actions > Finalize Metadata Upgrade and click Finalize Metadata Upgrade to confirm.

For Sentry with an Oracle Database, Add the AUTHZ_PATH.AUTHZ_OBJ_ID Index

If your cluster uses Sentry and an Oracle database, you must manually add the index on the AUTHZ_PATH.AUTHZ_OBJ_ID column if it does not already exist. Adding the index manually decreases the time Sentry takes to get a full snapshot for HDFS sync. Use the following command to add the index:

CREATE INDEX "AUTHZ_PATH_FK_IDX" ON "AUTHZ_PATH" ("AUTHZ_OBJ_ID");

Complete Post-Upgrade Migration Steps

Several components require additional migrations steps after you complete the CDH upgrade:

Impala – See Impala Upgrade Considerations
Cloudera Search
After upgrading to CDH 6, you must re-index your collections, see Re-Indexing Solr Collections After Upgrading to CDH 6.
Spark – See Apache Spark Post Upgrade Migration Steps.
MapReduce 1 to MapReduce 2 – See Migrating from MapReduce 1 (MRv1) to MapReduce 2 (MRv2)
Kudu – See Upgrade Notes for Kudu 1.10 / CDH 6.3
Kafka
Show Me How
1. Remove the following properties from the Kafka Broker Advanced Configuration Snippet (Safety Valve) configuration property.
  - Inter.broker.protocol.version
  - log.message.format.version
2. Save your changes.
3. Restart the cluster:
  1. On the Home > Status tab, click to the right of the cluster name and select Restart.
  2. Click Restart that appears in the next screen to confirm. If you have enabled high availability for HDFS, you can choose Rolling Restart instead to minimize cluster downtime. The Command Details window shows the progress of stopping services.
    When All services successfully started appears, the task is complete and you can close the Command Details window.
HBase - When upgrading to CDH 5.16.1, the hbase/thrift configuration gets broken and needs to be fixed.
Show Me How
1. Ensure that you have an HBase Thrift Server instance.
  If you do not have an HBase Thrift Server, do the following:
  1. Select the HBase service and click the Instances tab.
  2. Click the Add Role Instances tab.
  3. Follow the wizard to add an HBase Thrift Server Role Instance.
2. Select the Hue service and click the Configuration tab.
3. Search for hbase.
4. Ensure that HBase Service and HBase Thrift Server are set to other than none.
5. If you use Kerberos, do the following:
  1. Select the HBase service and click the Configuration tab.
  2. Search for hbase thrift authentication.
  3. Set HBase Thrift Authentication to one of the following options:
    auth-conf: authentication, integrity and confidentiality checking
    
    auth-int: authentication and integrity checking
    
    auth: authentication only
  4. If you use Impersonation, do the following:
    1. Search for hbase thrift.
    2. Ensure that both Enable HBase Thrift Http Server and Enable HBase Thrift Proxy User are checked.
  5. Verify that HBase allows proxy users:
    1. Navigate to the directory /var/run/cloudera-scm-agent/process/<id>-hbase-HBASETHRIFTSERVER.
    2. Check the core-site.xml and verify that HBase is authorized to impersonate someone:
      <property> <name>hadoop.proxyuser.hbase.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hbase.groups</name> <value>*</value> </property>
6. Select the Hue service and click the Configuration tab.
7. Search for hue_safety_valve.ini.
8. Find Hue Service Advance Configuration Snippet (Safety Valve) for hue_safety_valve.ini and add the following snippet:
```
[hbase]
hbase_conf_dir={{HBASE_CONF_DIR}}
```
9. If you are using CDH 5.15.0 or higher, add the following snippet in the above hbase section:
```
thrift_transport=buffered
```
10. Restart the HBase and Hue service by clicking the Stale Service Restart icon that is next to the service. It invokes the cluster restart wizard.

Exit Maintenance Mode

If you entered maintenance mode during this upgrade, exit maintenance mode.

On the Home > Status tab, click next to the cluster name and select Exit Maintenance Mode.

Categories: CDH | Cloudera Manager | Upgrading | All Categories

HSM KMS

CDH 6 Post-Upgrade Migration