Data Hub rolling upgrades

The Data Hub rolling upgrade allows you to upgrade the Data Hub without stopping the Data Hub itself and incurring workload downtime.

Data Hub rolling upgrades are only available for the following cluster types:

Cloudera Operational Database
Cloudera Streams Messaging

Cloudera Operational Database clusters

For an Operational Database cluster to be eligible for rolling upgrade, the following requirements must be met:

Currently, rolling OS upgrade is only supported on COD clusters with storage selected as HDFS or cloud without ephemeral storage.
Rolling OS upgrade is not supported on Micro COD clusters.

Table 1.
Current Runtime version & template	Rolling upgrade support for Runtime?	Rolling OS upgrade support?
7.2.18	Yes, to 7.2.18+ (including service pack upgrades to 7.2.18.100 and higher versions) Up to 20 nodes	Yes, for up to 20 nodes
7.2.17.x	Yes, to 7.2.18+	Yes

For instructions on performing COD rolling upgrades, see Performing a Cloudera Runtime upgrade (COD) and Performing a Cloudera operating system upgrade (COD).

Cloudera Streams Messaging clusters

For a Streams Messaging cluster to be eligible for rolling upgrade, the following requirements must be met:

The cluster OS must be RHEL 8. Streams Messaging clusters on CentOS are not supported.
The cluster must be Runtime version 7.2.18 or higher version.
The cluster template must be Streams Messaging High Availability.


Current Runtime version & template	Rolling upgrade support for Runtime?	Rolling OS upgrade support?
7.2.18 Streams Messaging High Availability template (Clusters created on 7.2.18 only)	Yes, to 7.2.18+ (including service pack upgrades to 7.2.18.100 and higher versions)	Yes, for up to 20 nodes. important Rolling OS upgrades replace nodes one-by-one and can take a significant amount of time, several hours or more, depending on the number of nodes.

In some circumstances, a rolling upgrade may not be supported for a Streams Messaging cluster, but can be enabled through entitlement. The Data Hub upgrade UI displays information about whether a rolling upgrade is available, unavailable, or may be available under entitlement. For instructions on performing a Data Hub upgrade, including rolling upgrades, see Upgrading Data Hubs. For information about obtaining an entitlement for rolling upgrade, contact Cloudera Customer Support.

Data Hub rolling upgrade limitations and issues

Data Hub rolling upgrades have the following limitations:

Cloudera Operational Database clusters

HBase commands may fail during the rolling upgrade with the error "ServerNotRunningYetException: Server is not running yet." HBase retries DDL operations submitted while the master is initializing until the master is fully initialized to serve the request. However, a situation might arise where the default number of retries or intervals proves to be insufficient for an operation submitted by the client to complete.

Implementing the following configuration adjustments in your client application can support the master getting initialized up to 10 minutes:
```
<property>
    <name>hbase.client.pause</name>
    <value>300</value>
 </property>
 <property>
    <name>hbase.client.retries.number</name>
    <value>20</value>
 </property>
```
If you have seen a longer or shorter master initialization period, you can modify these values accordingly. These retry settings apply to all types of calls to HBase service, encompassing GET, SCAN, MUTATE, and DDLs.
In a rolling restart, if a COD cluster has less than 10 datanodes, existing writes can fail with an error indicating a new block cannot be allocated and all nodes are excluded. This is because the client has attempted to use all the datanodes in the cluster, and failed to write to each of them as they were restarted. This will only happen on small clusters of less than 10 datanodes, as larger clusters have more spare nodes to allow the write to continue.
When performing a maintenance upgrade or other cluster upgrade, in some occasions there can been an error when the upgrade is nearly complete, but trying to re-start services/roles. The error is similar to: "Failed to start role hue-HUE_SERVER-8cc9321b2213cc5c6846c64e1fc6b1cb of service hue in cluster cod--xoaitnb0wnl1. This role requires the following additional parcels to be activated before it can start: [cdh]."
This is due to an agent operation that sometimes is delayed and can interfere with the role start. When this happens, resume the failed upgrade from Cloudera Manager as a 'Full Admin' user.
During the VM replacement as part of OS upgrade, every new node gets a new IP Address, and if the old IP address is cached somewhere, HDFS requests fail with UnknownHostException. It recovers after some time (10 mins max).
During OS upgrades, attempts to access Knox on the host being upgraded may produce occasional 403 HTTP responses. Wait and retry the failed requests.

For more limitations of Cloudera Operational Database, see Rolling upgrade limitations (COD).

Cloudera Streams Messaging clusters

Rolling upgrades are not supported for Cruise Control or Streams Messaging Manager (SMM). When upgrading a Streams Messaging cluster, expect that both of these services will be temporarily unavailable during the upgrade. This, however, does not impact Kafka's ability to perform a rolling upgrade.
Rolling upgrades for Schema Registry are only supported from Cloudera Runtime 7.2.18 or higher to higher versions. When upgrading a Streams Messaging cluster from a lower version, expect that clients connecting to the Schema Registry service might experience downtime. This, however, does not impact Kafka's ability to perform a rolling upgrade.