Cloudera Manager Failover Protection

A CDH cluster managed by Cloudera Manager can have only one instance of Cloudera Manager active at a time. A Cloudera Manager instance is backed by a single database instance that stores configurations and other operational data.

CDH deployments that use highly available configurations for Cloudera Manager can configure a “standby” instance of Cloudera Manager that takes over automatically if the primary instance fails. In some situations, a second instance of Cloudera Manager may become active during maintenance or upgrade activities or due to operator error. If two instances of Cloudera Manager are active at the same time and attempt to access the same database, data corruption can result, making Cloudera Manager unable to manage the cluster.

In Cloudera Manager 5.7 and higher, Cloudera Manager automatically detects when more than one instance of Cloudera Manager is running and logs a message in the /var/log/cloudera-scm-server/cloudera-scm-server.log file. For example:
2016-02-17 09:47:27,915 WARN
main:com.cloudera.server.cmf.components.ScmActive:
ScmActive detected spurious CM : hostname=sysadmin-scm-2.mycompany.com/172.28.197.136,bootup true
2016-02-17 09:47:27,916 WARN
main:com.cloudera.server.cmf.components.ScmActive: ScmActive:
The database is owned by sysadmin-scm-1.mycompany.com/172.28.197.242
2016-02-17 09:47:27,917 ERROR main:com.cloudera.server.cmf.bootstrap.EntityManagerFactoryBean: ScmActiveat bootup:
The configured database is being used by another instance of Cloudera Manager.
In addition, the second instance of Cloudera Manager is automatically shut down, resulting in messages similar to the following in the log file:
2016-02-17 09:47:27,919 ERROR main:com.cloudera.server.cmf.Main: Serverfailed.2016-02-17 09:47:27,919
ERROR main:com.cloudera.server.cmf.Main: Serverfailed.org.springframework.beans.factory.BeanCreationException:
Error creatingbean with name 'com.cloudera.server.cmf.TrialState':
Cannot resolvereference to bean 'entityManagerFactoryBean' while setting constructorargument;
nested exception isorg.springframework.beans.factory.BeanCreationException:
Error creatingbean with name 'entityManagerFactoryBean':
FactoryBean threw exception onobject creation; nested exception is java.lang.RuntimeException: ScmActiveat bootup:
Failed to validate the identity of Cloudera Manager.

When a Cloudera Manager instance fails or becomes unavailable and remains offline for more than 30 seconds, any new instance that is deployed claims ownership of the database and continues to manage the cluster normally.

Disabling Automatic Failover Protection

You can disable automatic shutdown by setting a Java option and restarting Cloudera Manager:
  1. On the host where Cloudera Manager server is running, open the following file in a text editor:
    /etc/default/cloudera-scm-server
  2. Add the following property (separate each property with a space) to the line that begins with export CMF_JAVA_OPTS:
    -Dcom.cloudera.server.cmf.components.scmActive.killOnError=false
    For example:
    export CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dcom.cloudera.server.cmf.components.scmActive.killOnError=false”
    
  3. Restart the Cloudera Manager server by running the following command on the Cloudera Manager server host:
    sudo service cloudera-scm-server restart