Rolling Restart

Minimum Required Role: Operator (also provided by Configurator, Cluster Administrator, Full Administrator)

Rolling restart allows you to conditionally restart the role instances of the following services to update software or use a new configuration:
  • Flume
  • HBase
  • HDFS
  • Kafka
  • Key Trustee KMS
  • Key Trustee Server
  • MapReduce
  • Oozie
  • YARN
  • ZooKeeper

If the service is not running, rolling restart is not available for that service. You can specify a rolling restart of each service individually.

Performing a Service or Role Rolling Restart

You can initiate a rolling restart from either the Status page for one of the eligible services, or from the service's Instances page, where you can select individual roles to be restarted.

  1. Go to the service you want to restart.
  2. Do one of the following:
    • service - Select Actions > Rolling Restart.
    • role -
      1. Click the Instances tab.
      2. Select the roles to restart.
      3. Select Actions for Selected > Rolling Restart.
  3. In the pop-up dialog box, select the options you want:
    • Restart only roles whose configurations are stale
    • Restart only roles that are running outdated software versions
    • Which role types to restart
  4. If you select an HDFS, HBase, MapReduce, or YARN service, you can have their worker roles restarted in batches. You can configure:
    • How many roles should be included in a batch - Cloudera Manager restarts the worker roles rack-by-rack in alphabetical order, and within each rack, hosts are restarted in alphabetical order. If you are using the default replication factor of 3, Hadoop tries to keep the replicas on at least 2 different racks. So if you have multiple racks, you can use a higher batch size than the default 1. But you should be aware that using too high batch size also means that fewer worker roles are active at any time during the upgrade, so it can cause temporary performance degradation. If you are using a single rack only, you should only restart one worker node at a time to ensure data availability during upgrade.
    • How long should Cloudera Manager wait before starting the next batch.
    • The number of batch failures that will cause the entire rolling restart to fail (this is an advanced feature). For example if you have a very large cluster you can use this option to allow failures because if you know that your cluster will be functional even if some worker roles are down.
  5. Click Confirm to start the rolling restart.