Step 1: Getting Started Upgrading a Cluster

Tasks you should perform before starting the upgrade.

Loading Filters ... 7.11.3 7.7.3 7.7.1 7.6.7 7.6.1 7.4.4 7.3.1 7.1.9 7.1.8 7.1.7.3000 7.1.7.2000 7.1.7.1000 7.1.7 7.1.6 7.1.9.1000 7.1.9 7.1.8 7.1.7.3000 7.1.7.2000 7.1.7.1000 7.1.7

The version of CDH or Cloudera Runtime that you can upgrade to depends on the version of Cloudera Manager that is managing the cluster. You may need to upgrade Cloudera Manager before upgrading your clusters. Upgrades are not supported when using Cloudera Manager 7.0.3.

Before you upgrade a cluster,
  • You need to gather information, review the limitations and release notes and run some checks on the cluster. See the Collect Information section below. Fill in the My Environment form below to customize your upgrade procedures.
  • You must clean all the HBase Master procedure stores. For more details, see Clean the HBase Master procedure store.

Minimum Required Role: Cluster Administrator (also provided by Full Administrator) This feature is not available when using Cloudera Manager to manage Data Hub clusters.

Collect Information

Collect the following information about your environment and fill in the form above. This information will be remembered by your browser on all pages in this Upgrade Guide.

  1. Log in to the Cloudera Manager Server host.
    ssh my_cloudera_manager_server_host
  2. Run the following command to find the current version of the Operating System:
    lsb_release -a
  3. Log in to the Cloudera Manager Admin console and find the following:
    1. The version of Cloudera Manager used in your cluster. Go to Support > About.
    2. The version of the JDK deployed in the cluster. Go to Support > About.
    3. Whether High Availability is enabled for HDFS. Go to Cloudera Manager > Clusters > HDFS

      If you see a standby namenode instead of a secondary namenode listed under Cloudera Manager > HDFS > Instances, then High Availability is enabled.

    4. The current cluster version (parcels). The cluster version number along with (parcels) are displayed on the Cloudera Manager Home page, to the right of the cluster name.

Preparing to Upgrade a Cluster

  1. You must have SSH access to the Cloudera Manager server hosts and be able to log in using the root account or an account that has password-less sudo permission to all the hosts.
  2. Review the Requirements and Supported Versions for the new versions you are upgrading to. See: CDP Private Cloud Base 7.1 Requirements and Supported Versions If your hosts require an operating system upgrade, you must perform the upgrade before upgrading the cluster. See Upgrading the Operating System to a new Major Version.
  3. Ensure that a supported version of Java is installed on all hosts in the cluster. See the links above. For installation instructions and recommendations, see Upgrading the JDK.
  4. Review the following documents:
  5. If your deployment has defined a Compute cluster and an associated Data Context, you will need to delete the Compute cluster and Data context before upgrading the base cluster and then recreate the Compute cluster and Data context after the upgrade.

    See Starting, Stopping, Refreshing, and Restarting a Cluster and Virtual Private Clusters and Cloudera SDX.

  6. Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  7. If the cluster uses Impala, check your SQL against the newest reserved words listed in incompatible changes. If upgrading across multiple versions, or in case of any problems, check against the full list of Impala reserved words.
  8. If the cluster uses Hive, validate the Hive Metastore Schema:
    1. In the Cloudera Manager Admin Console, Go to the Hive service.
    2. Select Actions > Validate Hive Metastore Schema.
    3. Fix any reported errors.
    4. Select Actions > Validate Hive Metastore Schema again to ensure that the schema is now valid.
  9. Run the Security Inspector and fix any reported errors.

    Go to Administration > Security > Security Inspector.

  10. Log in to any cluster node as the hdfs user, run the following commands, and correct any reported errors:
    hdfs fsck / -includeSnapshots -showprogress
    hdfs dfsadmin -report
    See HDFS Commands Guide in the Apache Hadoop documentation.
  11. Log in to any DataNode as the hbase user, run the following command, and correct any reported errors:
    hbase hbck 
  12. If the cluster uses Kudu, log in to any cluster host and run the ksck command as the kudu user (sudo -u kudu). If the cluster is Kerberized, first kinit as kudu then run the command:
    kudu cluster ksck <master_addresses>

    For the full syntax of this command, see Checking Cluster Health with ksck.

  13. If your cluster uses Impala and Llama, this role has been deprecated as of CDH 5.9 and you must remove the role from the Impala service before starting the upgrade. If you do not remove this role, the upgrade wizard will halt the upgrade.
    To determine if Impala uses Llama:
    1. Go to the Impala service.
    2. Select the Instances tab.
    3. Examine the list of roles in the Role Type column. If Llama appears, the Impala service is using Llama.
    To remove the Llama role:
    1. Go to the Impala service and select Actions > Disable YARN and Impala Integrated Resource Management.

      The Disable YARN and Impala Integrated Resource Management wizard displays.

    2. Click Continue.

      The Disable YARN and Impala Integrated Resource Management Command page displays the progress of the commands to disable the role.

    3. When the commands have completed, click Finish.
  14. If your cluster uses the Ozone technical preview, you must stop and delete this service before upgrading the cluster.
  15. If your cluster uses Kafka, you must explicitly set the Kafka protocol version to match what's being used currently among the brokers and clients. Update kafka.properties on all brokers as follows:
    1. Log in to the Cloudera Manager Admin Console.
    2. Choose the Kafka service.
    3. Click Configuration.
    4. Use the Search field to find the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties configuration property.
    5. Add the following properties to the snippet:
      • inter.broker.protocol.version = [***CURRENT KAFKA VERSION***]
      • log.message.format.version = [***CURRENT KAFKA VERSION***]
      Replace [***CURRENT KAFKA VERSION***] with the version of Apache Kafka currently being used. See Cloudera Runtime component versions for the Apache Kafka versions shipped with Cloudera Runtime. Make sure you enter full Apache Kafka version numbers with three values, such as 0.10.0. Otherwise, you will see an error message similar to the following:
      2018-06-14 14:25:47,818 FATAL kafka.Kafka$:
      java.lang.IllegalArgumentException: Version `0.10` is not a valid version
              at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
              at kafka.api.ApiVersion$$anonfun$apply$1.apply(ApiVersion.scala:72)
              at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  16. If your cluster uses Streams Replication Manager and you configured the Log Format property, you must take note of your configuration. The value set for Log Format is cleared during the upgrade and must be manually reconfigured following the upgrade.
  17. If your cluster uses Streams Replication Manager and you are affected by OPSAPS-62546, ensure that you apply the workaround detailed in the Known issue. If the workaround is not applied, you might run into issues after the upgrade. For more information, see the SRM Known Issues.
  18. If your cluster uses Streams Replication Manager, export or migrate aggregated metrics.

    In Cloudera Runtime 7.1.9, major changes are made to the internal Kafka Streams application of SRM. As a result, SRM by default loses all aggregated metrics that were collected before the upgrade. This means that you will not be able to query metrics with the SRM Service REST API that describe the pre-upgrade state of replications. If you want to retain the metrics, you can either export them, for archival purposes, or migrate them to the new format used by SRM. If you do not need to retain metrics, you can skip this step and continue with the upgrade.

    Exporting metrics creates a backup of the metric data, however, exported metrics cannot be imported into the SRM Service for consumption. As a result, exporting metrics is only useful for data archival purposes.

    Migrating metrics can be done in two different ways depending on whether you are doing a rolling upgrade or a non-rolling upgrade.

    • In case of a non-rolling upgrade, migration happens following the upgrade. In this case, the new version of the internal Kafka Streams application running in the upgraded cluster starts to process historical metrics as soon as it is online. However, until the metrics are processed, the SRM Service cannot serve requests regarding latest metrics and returns empty or missing responses on its REST API. The duration of this downtime depends on the number SRM Service instances and the amount of metrics in the cluster.

    • In case of a rolling upgrade, a migration process called SRM Service Migrator is initiated during the upgrade. The Migrator processes existing metrics so that they become compatible with your upgraded cluster. Depending on the size of your cluster and the amount of metrics you have, this process may take up to multiple hours to finish.

    Use the following endpoints of the SRM Service REST API to export metrics.
    If upgrading from Cloudera Runtime 7.1.8:
    • /v2/topic-metrics/{source}/{target}/{upstreamTopic}/{metric}
    • /v2/cluster-metrics/{source}/{target}/{metric}
    If upgrading from Cloudera Runtime 7.1.7 or lower:
    • /topic-metrics/{topic}/{metric}
    • /cluster-metrics/{cluster}/{metric}

    For more information regarding the SRM Service REST API, see Streams Replication Manager Service REST API or Streams Replication Manager REST API Reference.

    1. In Cloudera Manager, select the SRM service.
    2. Go to Configuration.
    3. Add the following to the SRM Service Environment Advanced Configuration Snippet (Safety Valve) property:
      Key: SRM_SERVICE_SKIP_MIGRATION
      Value: false
    1. Ensure that the target clusters of the SRM Service are available and healthy.

      If a target cluster is unavailable, the upgrade will fail. As a result, if a target cluster is unavailable, or you expect a target cluster to become unavailable during the upgrade, remove it from SRM’s configuration for the duration of the upgrade. Metrics in the target clusters that you remove are not migrated. Target clusters are specified in Streams Replication Manager Service Target Cluster.

    2. In Cloudera Manager, select the SRM service and go to Configuration.
    3. Add the following to the SRM Service Environment Advanced Configuration Snippet (Safety Valve) property:
      Key: SRM_SERVICE_SKIP_MIGRATION
      Value: false
    4. Fine-tune the behavior of the migration process.

      The SRM Service Metrics Migrator (the migration process) has a number of user configurable properties. Fine tuning the configuration can help in reducing the time it takes to migrate the metrics.

      These properties do not have dedicated entries in Cloudera Manager, instead you must use SRM Service Advanced Configuration Snippet (Safety Valve) for srm-service.yaml to configure them. If you are unsure about configuration, skip configuration and continue with the next step.
      Table 1. SRM Service Migrator properties and recommendations
      Property Default Value Description Cloudera recommendations
      streams.replication.manager.migrator.monitor.timeout.ms 3,600,000 The time in milliseconds after the Streams Replication Manager (SRM) Service Metrics Migrator times out. Set this timeout to a value that is higher than the expected migration time. Cloudera recommends a value that is at least three times the expected migration time.

      The migration time depends on the amount of metrics in your deployment. The higher the number of metrics, the longer the migration process, and the higher this property must be set.

      For example, a deployment with 10,000 partitions (100 topics with a 100 partitions each) and a 2 hour retention period produces, at minimum, 30,000 metrics per metric emission cycle. In a case like this, migration takes around 10 minutes to finish.

      streams.replication.manager.migrator.monitor.backoff.ms 120,000 The frequency at which the progress of the Streams Replication Manager (SRM) Service Metrics Migrator is checked. The recommended values for this property differ depending on the version you are upgrading from.

      If upgrading from 7.1.8:

      Set this property to a value that is identical with or similar to the interval set in SRM Service Streams Commit Interval The default value of this property is identical with the default value of SRM Service Streams Commit Interval.

      If upgrading from 7.1.7 or lower:

      Set this property to 30,000 (30 seconds).

      streams.replication.manager.migrator.monitor.stop.delay.ms 60,000 The amount of time in milliseconds that the Streams Replication Manager (SRM) Service Metrics Migrator processes metrics after the streams application is considered caught up
      streams.replication.manager.migrator.monitor.min.consecutive.successful.checks 3 The number of consecutive checks where the lag must be within the configured threshold to consider the Streams Replication Manager (SRM) Service Metrics Migrator successful. All target clusters of the SRM Service must be caught up for a check to be successful.
      streams.replication.manager.migrator.monitor.max.offset.lag Calculated automatically if left empty. The amount of offsets that the streams application in the Streams Replication Manager (SRM) Service Metrics Migrator is allowed to lag behind and still be considered up to date. When left empty, the migration logic will automatically calculate the maximum offset lag based on the Kafka Streams application configuration and the amount of metrics messages. Low offset lag values result in more up-to-date metrics processing following the upgrade, but also increase the time required for the upgrade to finish. An appropriate value for this property is calculated automatically if the property is left empty. As a result, Cloudera recommends that you leave this property empty and use the automatically calculated value.
    5. Click Save Change.
    6. Restart the SRM service.
  19. If your cluster uses Cruise Control and you have customized the goals in Cruise Control, ensure that you have created a copy from the values of the following goals before upgrading your cluster:
    • Default goals
    • Supported goals
    • Hard goals
    • Self-healing goals
    • Anomaly detection goals
    For more information, see the Cruise Control Known Issues.
  20. The following services are no longer supported as of CDP Private Cloud Base:
    • Accumulo
    • Sqoop 2
    • MapReduce 1
    • Record Service

    You must stop and delete these services before upgrading a cluster.

  21. Open the Cloudera Manager Admin console and collect the following information about your environment:
    1. The version of Cloudera Manager. Go to Support > About.
    2. The version of the JDK deployed. Go to Support > About.
    3. The version of CDH or Cloudera Runtime and whether the cluster was installed using parcels or packages. It is displayed next to the cluster name on the Home page.
    4. The services enabled in your cluster.

      Go to Clusters > Cluster name.

  22. Back up Cloudera Manager before beginning the upgrade. Before upgrading a cluster, you should backup Cloudera Manager again, see Step 4: Back Up Cloudera Manager.