Backing Up the Cluster
This topic describes how to back up a cluster managed by Cloudera Manager prior to upgrading the cluster. These procedures do not back up the data stored in the cluster. Cloudera recommends that you maintain regular backups of your data using the Backup and Disaster Recovery features of Cloudera Manager.
Minimum Required Role: Cluster Administrator (also provided by Full Administrator) This feature is not available when using Cloudera Manager to manage Data Hub clusters.
Loading Filters ... 7.1.4 7.1.3 7.1.2 7.1.1 7.0.3 6.3.3 6.3.1 6.3.0 6.2.1 6.2.0 6.1.1 6.1.0 6.0.1 6.0.0 5.16 5.15 5.14 5.13 6.3.3 6.3.2 6.2.1 6.2.0 6.1.1 6.1.0 6.0.1 6.0.0 5.16 5.15 5.14 5.13 7.1.4 7.1.3 7.1.2 7.1.1 7.1.4 7.1.3 7.1.2 7.1.1
Complete the following backup steps before upgrading your cluster:
Back Up Databases
Gather the following information:
- Type of database (PostgreSQL, Embedded PostgreSQL, MySQL, MariaDB, or Oracle)
- Hostnames of the databases
- Database names
- Port number used by the databases
- Credentials for the databases
- Sqoop, Oozie, and Hue – Go to .
- Hive Metastore – Go to the Hive service, select Configuration, and select the Hive Metastore Database category.
- Sentry – Go to the Sentry service, select Configuration, and select the Sentry Server Database category.
- Ranger – Go to the Ranger service, select Configuration, and search on "database."
To back up the databases
- If not already stopped, stop the service. If Cloudera Manager
indicates that there are dependent services, also stop the dependent
- On the tab, click to the right of the service name and select Stop.
- Click Stop in the next screen to confirm. When you see a Finished status, the service has stopped.
- Back up the database. Substitute the database name, hostname,
port, user name, and backup directory path and run the following
mysqldump --databases database_name --host=database_hostname --port=database_port -u database_username -p > backup_directory_path/database_name-backup-`date +%F`-CDH.sql
- Work with your database administrator to ensure databases are properly backed up.
For additional information about backing up databases, see these vendor-specific links:
- MariaDB 10.2: https://mariadb.com/kb/en/backup-and-restore-overview/
- MySQL 5.7: https://dev.mysql.com/doc/refman/5.7/en/backup-and-recovery.html
- PostgreSQL10: https://www.postgresql.org/docs/10/static/backup.html
- Oracle 12c: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/bradv/index.html
- Start the service.
- On the tab, click to the right of the service name and select Start.
- Click Start in the next screen to confirm. When you see a Finished status, the service has started.
Back Up ZooKeeper
dataDirproperty in the ZooKeeper configuration. The default location is
/var/lib/zookeeper. For example:
cp -rp /var/lib/zookeeper/ /var/lib/zookeeper-backup-`date +%F`CM-CDH
To identify the ZooKeeper hosts, open the Cloudera Manager Admin console and go to the ZooKeeper service and click the Instances tab.
Record the permissions of the files and directories; you will need these to roll back ZooKeeper.
Back Up Solr used by Atlas
- Create an HDFS directory to store the Solr backup.
hdfs dfs -mkdir /tmp/atlas-solr-backups hdfs dfs -chown solr:hadoop /tmp/atlas-solr-backups
- With Solr running, make the following REST calls to backup the
Atlas Solr collections.
You can make these calls from any host that can access the Atlas host on the cluster. Note that you may have a different port number configured for Atlas.
curl --negotiate -ik -u : "http://<solr-server-host>:8983/solr/admin/collections?action=BACKUP&name=vertex_index_bkp&collection=vertex_index&location=/tmp/atlas-solr-backups" curl --negotiate -ik -u : "http://<solr-server-host>:8983/solr/admin/collections?action=BACKUP&name=edge_index_bkp&collection=edge_index&location=/tmp/atlas-solr-backups" curl --negotiate -ik -u : "http://<solr-server-host>:8983/solr/admin/collections?action=BACKUP&name=fulltext_index_bkp&collection=fulltext_index&location=/tmp/atlas-solr-backups"
Back Up Search
Back up your Solr metadata using the following procedure. This procedure allows you to roll back to the pre-upgrade state if any problems occur during the upgrade process.
- Make sure that the HDFS and ZooKeeper services are running.
- Stop the Solr service (Cancel and stop the dependent services first, and then stop the Solr service. ). If you see a message about stopping dependent services, click
- Make sure that the directory you specified for the Upgrade Backup
Directory configuration property exists in HDFS and is writable by the
Search superuser (
- Back up the Solr configuration metadata ( ).
- Start the Solr service ( ).
- Start any dependent services that you stopped.
Back Up HDFS
Follow this procedure to back up an HDFS deployment.
- If high availability is enabled for HDFS, run the following command
on all hosts running the JournalNode
cp -rp /dfs/jn /dfs/jn-CM-CDH
- On all NameNode hosts, back up the NameNode
runtime directory. Run the following
mkdir -p /etc/hadoop/conf.rollback.namenode
cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-NAMENODE\$" | head -1`
cp -rp * /etc/hadoop/conf.rollback.namenode/
rm -rf /etc/hadoop/conf.rollback.namenode/log4j.properties
cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name/log4j.properties /etc/hadoop/conf.rollback.namenode/
These commands create a temporary rollback directory. If a rollback to CDH 5.x is required later, the rollback procedure requires you to modify files in this directory.
- Back up the runtime directory for all DataNodes. Run the following
commands on all
mkdir -p /etc/hadoop/conf.rollback.datanode/
cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-DATANODE\$" | head -1`
cp -rp * /etc/hadoop/conf.rollback.datanode/
rm -rf /etc/hadoop/conf.rollback.datanode/log4j.properties
cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name/log4j.properties /etc/hadoop/conf.rollback.datanode/
- If high availability is not enabled for HDFS, backup the
runtime directory of the Secondary NameNode. Run the following
commands on all Secondary NameNode
mkdir -p /etc/hadoop/conf.rollback.secondarynamenode/
cd /var/run/cloudera-scm-agent/process/ && cd `ls -t1 | grep -e "-SECONDARYNAMENODE\$" | head -1`
cp -rp * /etc/hadoop/conf.rollback.secondarynamenode/
rm -rf /etc/hadoop/conf.rollback.secondarynamenode/log4j.properties
cp -rp /etc/hadoop/conf.cloudera.HDFS_service_name/log4j.properties /etc/hadoop/conf.rollback.secondarynamenode/
Back Up Key Trustee Server and Clients
For the detailed procedure, see Backing Up and Restoring Key Trustee Server and Clients.
Back Up HSM KMS
When running the HSM KMS in high availability mode, if either of the two nodes fails, a role instance can be assigned to another node and federated into the service by the single remaining active node. In other words, you can bring a node that is part of the cluster, but that is not running HSM KMS role instances, into the service by making it an HSM KMS role instance–more specifically, an HSM KMS proxy role instance and an HSM KMS metastore role instance. So each node acts as an online ("hot" backup) backup of the other. In many cases, this will be sufficient. However, if a manual ("cold" backup) backup of the files necessary to restore the service from scratch is desirable, you can create that as well.
To create a backup, copy the
/var/lib/hsmkp-meta directories on one or more of the
nodes running HSM KMS role instances.
To restore from a backup: bring up a completely new instance of the HSM
KMS service, and copy the
/var/lib/hsmkp-meta directories from the backup onto
the file system of the restored nodes before starting HSM KMS for the
Back Up Navigator Encrypt
- To manually back up the Navigator Encrypt configuration directory
$ zip -r --encrypt nav-encrypt-conf.zip /etc/navencrypt
--encryptoption prompts you to create a password used to encrypt the zip file. This password is also required to decrypt the file. Ensure that you protect the password by storing it in a secure location.
- Move the backup file (
nav-encrypt-conf.zip) to a secure location.
Back Up HBase
Because the rollback procedure also rolls back HDFS, the data in HBase is also rolled back. In addition, HBase metadata stored in ZooKeeper is recovered as part of the ZooKeeper rollback procedure.
If your cluster is configured to use HBase replication, Cloudera recommends that you document all replication peers. If necessary (for example, because the HBase znode has been deleted), you can roll back HBase as part of the HDFS rollback without the ZooKeeper metadata. This metadata can be reconstructed in a fresh ZooKeeper installation, with the exception of the replication peers, which you must add back. For information on enabling HBase replication, listing peers, and adding a peer, see HBase Replication in the CDH 5 documentation.
Back Up Sqoop 2
If you are not using the default embedded Derby database for Sqoop 2,
back up the database you have configured for Sqoop 2. Otherwise, back up
repository subdirectory of the Sqoop 2 metastore
directory. This location is specified with the Sqoop 2 Server
Metastore Directory property. The default location is:
/var/lib/sqoop2. For this default location, Derby
database files are located in
Back Up Hue
- On all hosts running the Hue Server role, back up the app registry
mkdir -p /opt/cloudera/parcels_backup cp -rp /opt/cloudera/parcels/CDH/lib/hue/app.reg /opt/cloudera/parcels_backup/app.reg-CM-CDH
cp -rp /usr/lib/hue/app.reg /usr/lib/hue_backup/app.reg-CM-CDH