ECS Server High Availability

ECS Server High Availability (HA) is not enabled by default – you must enable it after installing ECS. If you do not wish to enable ECS HA, you can safely ignore this section. If you are enabling ECS HA, you should review the following notes and supported ECS Server scenarios before proceeding.

ECS Server scenarios

Clusters with only two servers are not supported. This is only for the temporary transition from a single server cluster to a three server cluster.

  1. Three or more servers
    • Redundancy requirements:
      • One failure requires three or more servers
      • Two failures require five or more servers
      • For more information see, Fault Tolerance
    • To recover, you must scale-up the ECS Server roles. For more information on adding ECS node to a cluster, see the following section.
  2. Two servers to one server
    • Only after a double failure in a three server cluster
    • To recover:
      • Stop the ECS service
      • Remove both the failed ECS server roles and hosts from cluster
      • On the surviving server, run the following command /opt/cloudera/parcels/ECS/bin/rke2 server --cluster-reset
      • Start the ECS service
  3. Single server
    • No failure supported