Step 5: Complete Pre-Upgrade steps for upgrades to CDP Private Cloud Base

Steps to complete before upgrading CDH to CDP.

Loading Filters ... 5.16 5.15 5.14 5.13 7.6.7 7.4.4 5.16 5.15 5.14 5.13 7.1.7.2000 7.1.7

Minimum Required Role: Cluster Administrator (also provided by Full Administrator) This feature is not available when using Cloudera Manager to manage Data Hub clusters.

Ensure that you have completed the following steps when upgrading from CDH 5.x to CDP Private Cloud Base 7.1.

  • Cloudera Search – See Transitioning Cloudera Search configuration before upgrading to Cloudera Runtime.
  • Flume – Flume is not supported in CDP Private Cloud Base. You must remove the Flume service before upgrading to CDP Private Cloud Base.
  • HBase – See Checking Apache HBase.
  • Hive – See Migrating Hive 1-2 to Hive 3
  • Kafka – In CDH 5.x, Kafka was delivered as a separate parcel and could be installed along with CDH 5.x using Cloudera Manager. In Runtime 7.0.3 and later, Kafka is part of the Cloudera Runtime distribution and is deployed as part of the Cloudera Runtime parcels. To successfully upgrade Kafka you need to set the protocol version to match what's being used currently among the brokers and clients.
    1. Explicitly set the Kafka protocol version to match what's being used currently among the brokers and clients. Update kafka.properties on all brokers as follows:
      1. Log in to the Cloudera Manager Admin Console
      2. Choose the Kafka service.
      3. Click Configuration.
      4. Use the Search field to find the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties configuration property.
      5. Add the following properties to the snippet:
        • inter.broker.protocol.version = [***CURRENT KAFKA VERSION***]
        • log.message.format.version = [***CURRENT KAFKA VERSION***]
        Replace [***CURRENT KAFKA VERSION***] with the version of Apache Kafka currently being used. See the Product Compatibility Matrix for CDK Powered By Apache Kafka to find out which upstream version is used by which version of CDK. Make sure you enter full Apache Kafka version numbers with three values, such as 0.10.0. Otherwise, you will see an error message similar to the following:
        2018-06-14 14:25:47,818 FATAL kafka.Kafka$:
        java.lang.IllegalArgumentException: Version `0.10` is not a valid version
                at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
                at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
                at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    2. Save your changes. The information is automatically copied to each broker.
  • MapReduce – See Transitioning from MapReduce 1 to MapReduce 2.
  • Navigator –- See Transitioning Navigator content to Atlas
  • Replication Schedules – See CDH cluster upgrade requirements for Replication Manager.
  • Sentry The Sentry service has been replace with Apache Ranger in Cloudera Runtime 7.1. You must perform several steps before upgrading your cluster. See Transitioning the Sentry service to Apache Ranger.
  • Virtual Private Clusters:

    If your deployment has defined a Compute cluster and an associated Data Context, you will need to delete the Compute cluster and Data context before upgrading the base cluster and then recreate the Compute cluster and Data context after the upgrade.

  • YARN : Decommission and recommission the YARN NodeManagers but do not start the NodeManagers. A decommission is required so that the NodeManagers stop accepting new containers, kill any running containers, and then shutdown.
    1. Ensure that new applications, such as MapReduce or Spark applications, will not be submitted to the cluster until the upgrade is complete.
    2. In the Cloudera Manager Admin Console, navigate to the YARN service for the cluster you are upgrading.
    3. On the Instances tab, select all the NodeManager roles. This can be done by filtering for the roles under Role Type.
    4. Click Actions for Selected (number) > Decommission.

      If the cluster runs CDH 5.9 or higher and is managed by Cloudera Manager 5.9 or higher, and you configured graceful decommission, the countdown for the timeout starts.

      A Graceful Decommission provides a timeout before starting the decommission process. The timeout creates a window of time to drain already running workloads from the system and allow them to run to completion. Search for the Node Manager Graceful Decommission Timeout field on the Configuration tab for the YARN service, and set the property to a value greater than 0 to create a timeout.

    5. Wait for the decommissioning to complete. The NodeManager State is Stopped and the Commission State is Decommissioned when decommissioning completes for each NodeManager.
    6. With all the NodeManagers still selected, click Actions for Selected (number) > Recommission.
  • HDFS: Review the current JVM heap size for the DataNodes on your cluster and ensure that the heap size is configured at the rate of 1 GB for every million blocks. Use the Java Heap Size of DataNode in Bytes property to configure the value.
    In addition, you can track the JVM heap usage through Cloudera Manager charts, as specified in the following steps:
    1. Open the Cloudera Manager Admin Console.
    2. Go to the HDFS service.
    3. Click the Charts Library tab.
    4. Select DataNodes from the list on the left.
    5. Click the Memory tab.
    6. Look at the chart titled DataNode JVM Heap Used Distribution. The maximum heap usage usage is the value in the last bucket of that histogram.