Cloudera Public Cloud upgrades FAQ

During the preparation for an upgrade, it is recommended to carefully review the questions and answers below.

General questions related to upgrading Cloudera

What is the length of the available maintenance window?

Currently, Data Lake backup and restore requires a maintenance window, where no metadata changes occur. Furthermore, Cloudera Data Hub clusters need to be stopped during an upgrade.

The Cloudera Public Cloud environment does not need to be upgraded in one go: you may opt to upgrade the Data Lake and all attached Data Hubs together, or start with the Data Lake upgrade only and perform individual Data Hub upgrades consecutively. Whether or not a Cloudera Data Hub upgrade is required depends on the Cloudera Runtime version of the Cloudera Data Hub cluster:

  • If your Cloudera Data Hub clusters are on Cloudera Runtime version 7.2.15 or earlier, they must run the same major/minor version of Cloudera Runtime as the Data Lake. In this scenario, after a Data Lake upgrade you are required to upgrade any Cloudera Data Hub clusters that are 7.2.15 or earlier to the same version as the Data Lake.
  • If your Cloudera Data Hub clusters are on Cloudera Runtime version 7.2.16 or later, they are compatible with newer versions of the Data Lake (7.2.17 or later). You can independently upgrade your Cloudera Data Hub clusters at a later time if you choose to, though it is not required.
What type of upgrade is required?
Currently, there are three types of upgrades available to Data Lake and Cloudera Data Hub clusters: service pack upgrades; minor/major version upgrades; and OS upgrades. Service pack and minor/major version upgrades install a newer version of Cloudera Manager and/or Cloudera Runtime. OS upgrades for Data Lakes and Cloudera Data Hub clusters are complementary and will bring the image of the cluster hosts to a newer version. If you plan to also perform an OS upgrade, plan the maintenance window accordingly.
Are ephemeral disks used for user or workload-related persistent data?

Major/minor version upgrades as well as service pack upgrades will bring Cloudera Manager and Cloudera Runtime to the selected version without impacting the underlying VM. However, OS upgrades will recreate the underlying VM with a fresh image, which results in the loss of any data stored on ephemeral disks.

If you are currently storing user or workload-related data on volumes using ephemeral disks, please reach out to Cloudera support while planning for the upgrade.

What Data Hub cluster templates are in use? Are you using custom templates?
Check whether in-place upgrade is supported for your built-in or custom Cloudera Data Hub template. Depending on the type and version of the Cloudera Data Hub cluster, additional backup steps, manual configuration changes or post-upgrade steps may be required. Check specific steps for upgrading the OS if you use Cloudera Flow Management. Cloudera Operational Database clusters have a different upgrade process.
What is the size of the SDX / Data Lake metadata?
SDX metadata includes the Hive Metastore database, Ranger audit log index, as well as Atlas metadata. If you are planning to perform a Data Lake backup before an upgrade (which is recommended), prepare your maintenance window accordingly. Cloudera supports skipping the backup of certain metadata to reduce the time required for backup and restore operations.
Are you using Data Services?
If you have deployed Cloudera Data Engineering, Cloudera Data Warehouse, Cloudera DataFlow, or Cloudera AI in your environment, be sure to check the Preparing for an upgrade topic to verify compatibility between the data service version and the Data Lake version or desired features/Cloudera Runtime services.

Questions related to upgrading to Cloudera Runtime 7.2.18

Will upgrading to Cloudera Runtime 7.2.18 and changing from CentOS to RHEL 8 cost Cloudera customers money?
Upgrading to Cloudera Public Cloud Runtime version 7.2.18 and changing the operating system from CentOS 7 to RHEL 8 will not incur additional costs, either to your organization or to the Cloud Service Provider. This update is designed for a frictionless transition and continued support, without financial impact.
What are the key features of Cloudera Runtime 7.2.18?
  • RHEL 8 is set as the default operating system, in anticipation of the CentOS 7 sunset on June 30, 2024, to ensure a modern and fully supported infrastructure.
  • Rolling upgrades allow for seamless updates of services without operational interruptions.
  • Iceberg support is now fully integrated with Atlas, enriching data management capabilities with comprehensive data lineage support.
  • The transition from Medium Duty to Enterprise Data Lakes enhances performance and scalability, aligning with advanced workload requirements.
  • Amazon S3 Express One Zone support provides a fast and cost-effective data storage option.
What are Rolling Upgrades, and how do they affect my operations?
Rolling Upgrades allow you to upgrade without causing any interruption to ongoing operations. This means customers can continue using their services while the upgrade process is underway. While Rolling Updates are not available for all services, key services including Data Lakes, Cloudera Operational Database, and Cloudera Streams Messaging Data Hub clusters now support this feature, significantly enhancing operational efficiency and minimizing disruption during upgrades.
Why is the upgrade to Cloudera Runtime 7.2.18 recommended?
Upgrading to Cloudera Runtime 7.2.18 is recommended for several reasons:
  • Cloudera Runtime 7.2.18 transitions to RHEL for modern, fully supported Linux infrastructure.
  • Cloudera Runtime 7.2.18 introduces rolling upgrades for key services.
  • Cloudera Runtime 7.2.18 integrates Iceberg with Atlas for enhanced data management and data lineage.
  • Cloudera Runtime 7.2.18 supports Amazon S3 Express One Zone for cost-effective, high-speed storage.
How does the transition to RHEL 8 benefit Cloudera customers?
The transition to RHEL 8 benefits Cloudera customers by ensuring a modern, fully supported Linux infrastructure that meets the latest industry standards. It enhances security, streamlines InfoSec approval processes, and offers greater automation capabilities. This transition supports advanced functionality such as Generative AI and real-time streaming, providing faster time-to-value with no additional costs.
Can you explain the Iceberg support with Atlas integration?
The integration of Iceberg with Atlas in the 7.2.18 update completes Cloudera's Iceberg integration story by providing comprehensive data lineage support. With Atlas, users gain visibility into the lineage of their Iceberg data. This enhancement enriches the platform's ability to manage and understand data across its lifecycle, facilitating better data governance and compliance.
What happens if a customer does NOT upgrade to RHEL 8 by June 30, 2024?
If you do not upgrade to RHEL 8 by the time it is deprecated, Cloudera will still accept support cases. We do ask customer account teams to file for an extension with CentOS. However, Cloudera will not publish OS patches or CVE fixes for CentOS-based images after June 2024.
How do I prepare my environment for upgrading to Cloudera Runtime 7.2.18?
To prepare your environment for upgrading to 7.2.18, check out Upgrading to Cloudera Runtime 7.2.18 for more information. Here you will find instructions on how to identify cluster versions, identify your upgrade path, and more.
How do I prepare my environment for upgrading to RHEL 8?
To prepare your environment for upgrading to RHEL 8, follow the guidance provided in Upgrading from CentOS to RHEL.
Is there a recommended upgrade path for users on various Cloudera Runtime versions?
Yes, and Cloudera offers documentation that guides you through the upgrade process. Check out Upgrading to Cloudera Runtime 7.2.18 for more information.
Can I roll back to a previous Cloudera Runtime version after upgrading to 7.2.18?
No, rolling back to a previous version after upgrading to 7.2.18 is not supported due to compatibility risks. Should you run into any errors or issues, Cloudera support is here to help.

Questions related to upgrading to Cloudera Runtime 7.3.1

What are the key features of Cloudera Runtime 7.3.1?
  • Unified Runtime for Cloudera Public Cloud and Cloudera Private Cloud.
  • RHEL 8.10 is set as the default operating system, while RHEL 8.8 continues to be supported.
  • Python 3.9 is set as the new default version.
  • OpenJDK 17 becomes the new default Java runtime version, new clusters will be launched with this version.

    OpenJDK 11 is no longer supported, and OpenJDK 11 clusters cannot be upgraded to Cloudera Runtime 7.3.1.

    After upgrading clusters from Runtime 7.2.x to 7.3.1, the JDK version remains the same, that is, JDK8.

What features will no longer be available after the upgrade to Cloudera Runtime 7.3.1?
  • Customers need to rebase their applications to Spark 3 as Spark 2 is no longer supported and clusters with Spark 2 will not be allowed to upgrade toCloudera Runtime 7.3.1.
  • Zeppelin service is turned off for Cloudera Runtime 7.3.1 clusters and gets removed automatically during the upgrade.
Why is the upgrade to Cloudera Runtime 7.3.1 recommended?
Upgrading to Cloudera Runtime 7.3.1 is recommended because the Cloudera Runtime 7.3.1 unified Runtime provides seamless portability of workloads across cloud and on-prem environments without rewrite.
How do I prepare my environment for upgrading to Cloudera Runtime 7.3.1?
To prepare your environment for upgrading to version 7.3.1, check out Upgrading to Cloudera Runtime 7.3.1 for more information. Here you will find instructions on how to identify cluster versions, identify your upgrade path, and more.
Is there a recommended upgrade path for users on various Cloudera Runtime versions?
Yes, and Cloudera offers documentation that guides you through the upgrade process. Check out Upgrading to Cloudera Runtime 7.3.1 for more information.
Can I roll back to a previous Cloudera Runtime version after upgrading to 7.3.1?
No, rolling back to a previous version after upgrading to 7.3.1 is not supported due to compatibility risks. Should you run into any errors or issues, Cloudera support is here to help.
How do I prepare my Cloudera Data Hub cluster for Spark 2 to Spark 3 migration?
If you have a Cloudera Data Engineering Data Hub cluster, during the Cloudera Runtime upgrade process to version 7.3.1, you need to remove Spark 2 while creating a new cluster with Spark 3 and migrate your data. Depending on your environment and Spark applications, besides the cluster upgrade tasks, you need to perform Spark application migration tasks and sidecar migration tasks for the Cloudera Data Hub cluster in multiple steps. For details, see Upgrading Apache Spark.
What happens to the Zeppelin service?
Zeppelin service is turned off for Cloudera Runtime 7.3.1 clusters and gets removed automatically during the upgrade. Therefore, you need to back up your data from the Zeppelin service before starting the Cloudera Data Hub upgrade to Runtime 7.3.1.
How do I back up data from the Zeppelin service?
You can back up Zeppelin data, to ensure all necessary components are preserved.
  1. Interpreter Configuration: Copy the interpreter.json file located in the zeppelin_home/conf folder. This file contains the interpreter settings crucial for Zeppelin operations.
  2. Notebooks: Back up the entire notebooks folder from the configured repository. This folder includes all Zeppelin notebooks.
  3. Cloudera Manager Configuration: Access the Cloudera Manager UI, locate the Zeppelin configurations and copy any custom configurations set for the site.xml and env.sh files. Ensure any safety valve variables added in Cloudera Manager are also documented as part of the backup.