Data Hub rolling upgrades

The Data Hub rolling upgrade allows you to upgrade the Data Hub without stopping the Data Hub itself and incurring workload downtime.

Data Hub rolling upgrades are only available for the following cluster types:

  • Cloudera Operational Database

Cloudera Operational Database clusters

For an Operational Database cluster to be eligible for rolling upgrade, the following requirements must be met:

  • Currently, rolling OS upgrade is only supported on COD clusters with storage selected as HDFS or cloud without ephemeral storage.
  • Rolling OS upgrade is not supported on Micro COD clusters.
Table 1.
Current Runtime version & template Rolling upgrade support for Runtime? Rolling OS upgrade support?
7.2.18 Yes, to 7.2.18+ (including service pack upgrades to 7.2.18.100+)

Up to 20 nodes

Yes, for up to 20 nodes
7.2.17.x Yes, to 7.2.18+ Yes

For instructions on performing COD rolling upgrades, see Performing a Cloudera Runtime upgrade (COD) and Performing a Cloudera operating system upgrade (COD).

Data Hub rolling upgrade limitations and issues

Data Hub rolling upgrades have the following limitations:

Cloudera Operational Database clusters

  • HBase commands may fail during the rolling upgrade with the error "ServerNotRunningYetException: Server is not running yet." HBase retries DDL operations submitted while the master is initializing until the master is fully initialized to serve the request. However, a situation might arise where the default number of retries or intervals proves to be insufficient for an operation submitted by the client to complete.

    Implementing the following configuration adjustments in your client application can support the master getting initialized up to 10 minutes:

    <property>
        <name>hbase.client.pause</name>
        <value>300</value>
     </property>
     <property>
        <name>hbase.client.retries.number</name>
        <value>20</value>
     </property>

    If you have seen a longer or shorter master initialization period, you can modify these values accordingly. These retry settings apply to all types of calls to HBase service, encompassing GET, SCAN, MUTATE, and DDLs.

  • In a rolling restart, if a COD cluster has less than 10 datanodes, existing writes can fail with an error indicating a new block cannot be allocated and all nodes are excluded. This is because the client has attempted to use all the datanodes in the cluster, and failed to write to each of them as they were restarted. This will only happen on small clusters of less than 10 datanodes, as larger clusters have more spare nodes to allow the write to continue.

  • When performing a maintenance upgrade or other cluster upgrade, in some occasions there can been an error when the upgrade is nearly complete, but trying to re-start services/roles. The error is similar to: "Failed to start role hue-HUE_SERVER-8cc9321b2213cc5c6846c64e1fc6b1cb of service hue in cluster cod--xoaitnb0wnl1. This role requires the following additional parcels to be activated before it can start: [cdh]."

    This is due to an agent operation that sometimes is delayed and can interfere with the role start. When this happens, resume the failed upgrade from Cloudera Manager as a 'Full Admin' user.

  • During the VM replacement as part of OS upgrade, every new node gets a new IP Address, and if the old IP address is cached somewhere, HDFS requests fail with UnknownHostException. It recovers after some time (10 mins max).
  • During OS upgrades, attempts to access Knox on the host being upgraded may produce occasional 403 HTTP responses. Wait and retry the failed requests.

For more limitations of Cloudera Operational Database, see Rolling upgrade limitations (COD).