HBase MCC uses the zookeeper.recovery.retry parameter to monitor the number of retries when Zookeeper is down.
When a cluster is online and an application is started, the HBase client communicates to the Zookeeper quorum. When you perform a read or write operation on region servers and a failover happens, MCC runs a background thread to check when HBase becomes available again. If an application is started and the cluster is down, MCC uses the zookeeper.recovery.retry parameter to check Zookeeper multiple times. This might not be desired, especially for Spark streaming which operates with a microbatch. Each batch needs to iterate over the default before it finally fails over and hit the second cluster.
You can set this parameter to either 1 or 0. You can set the parameters differently for each cluster; set it as default for the second cluster and 0 for the primary so that it may fail over quickly. In addition, when you set zookeeper.recovery.retry to 0 with HBase, the number of client connections to Zookeeper increases and you might want to monitor maxclientcnxns. This is applicable for a failure scenario, when an application is starting and Zookeeper is down.