Operations and management
You can configure Hue in a high availability mode. Cloudera Manager does not currently support high availability mode within CDP. Therefore, create a Cloudera Manager instance for each side of a disaster recovery cluster pair and periodically back up the configuration details, the Cloudera Manager database, and associated metric data stored on the Cloudera Manager server.
Hue
Hue can be configured in a high availability (HA) mode by using multiple Hue servers and a load balancer in front of the Hue service. Hue configuration and state can be maintained in an external RDBMS which provides its own backup and high availability.
For information about enabling high availability in Hue, see Configuring High Availability for Hue.
Hue utilizes a database for storing internal metadata about users interacting with various services within the Cloudera Data Platform. This database can be backed up using Hue’s built-in export functions. This creates a point-in-time backup that can be stored offline.
For information about backing up and restoring Hue data, see Backing up the Hue database.
Hue has no built-in replication capabilities. Periodically restoring a point-in-time backup is sufficient in many cases.
Cloudera Manager
Cloudera recommends creating a Cloudera Manager instance for each side of a disaster recovery cluster pair. While a single Cloudera Manager instance can manage multiple clusters, in the event of a disaster, your Cloudera Manager might be on the wrong side of the incident, leaving you without the ability to manage anything post-failover.
Because of this, keeping cluster configurations in sync between two Cloudera Manager instances can be a challenge. Adopting an appropriate change management and update procedure between the two clusters may be required. The cloudera-playbook Ansible framework can help normalize this by letting you apply a common set of configurations to individual clusters. See the cloudera-playbook details in the Cloudera Labs Github repository for more information about using Ansible.
For more information about the Cloudera Playbook Ansible Framework, see Cloudera-playbook on Cloudera Labs and Automating CDP Private Cloud Installations with Ansible.
Cloudera Manager does not currently support high availability mode within CDP. Because of this, it is important to periodically back up configuration details, the Cloudera Manager database, and associated metric data stored on the Cloudera Manager server.
For information about Cloudera Manager API, see Backing up the Cloudera Manager Configuration and Restoring the Cloudera Manager Configuration.
- RDBMS for Services
- In most cases, the RDBMS backing a particular service needs to be treated as a tightly coupled component of the cluster it is attached to. Individual services might maintain cluster-specific information within the databases that make it difficult to safely replicate the databases directly. As an example, the Hive Metastore database might refer to fully qualified HDFS paths that only make sense for one half of the disaster recovery pair. This would require you to use Hive-level replication and export mechanisms to transfer and transform Hive Schema details from one side of the disaster recovery pair to the other.
- System Configuration
- Operating system and infrastructure level configuration that sit outside the envelope of the Cloudera Data Platform products should be managed through a common configuration management infrastructure such as Ansible. This ensures configuration consistency and enforcement between your disaster recovery pair. It is highly recommended to keep your disaster recovery clusters as closely configured as possible from an architecture and design perspective.