Managing Cloudera SearchPDF version

Back up a Solr collection

Create backups of Solr collections using the solrctl utility to minimize data loss in case of a malfunction.

If you are using a secure (Kerberos-enabled) cluster, specify your jaas.conf file by adding the following parameter to each command:

--jaas /path/to/jaas.conf

If TLS is enabled for the Solr service, specify the truststore and password using the ZKCLI_JVM_FLAGS environment variable before you begin the procedure:

export ZKCLI_JVM_FLAGS="-Djavax.net.ssl.trustStore=/path/to/truststore \
-Djavax.net.ssl.trustStorePassword=trustStorePassword"
  1. Create a snapshot. On a host running Solr Server, run the following command:
    solrctl collection --create-snapshot <snapshotName> -c <collectionName>

    For example, to create a snapshot for a collection named tweets:

    solrctl collection --create-snapshot tweets-$(date +%Y%m%d%H%M) -c tweets
    Successfully created snapshot with name tweets-201803281043 for collection tweets
  2. If you are backing up the Solr collection to a remote cluster, prepare the snapshot for export. If you are backing up the Solr collection to the local cluster, skip this step.
    solrctl collection --prepare-snapshot-export <snapshotName> -c <collectionName> -d <destDir>

    The destination HDFS directory path (specified by the -d option) must exist on the local cluster before you run this command. Make sure that the Solr superuser (solr by default) has permission to write to this directory.

    For example:

    hdfs dfs -mkdir -p /path/to/backup-staging/tweets-201803281043
    hdfs dfs -chown :solr /path/to/backup-staging/tweets-201803281043
    solrctl collection --prepare-snapshot-export tweets-201803281043 -c tweets \
    -d /path/to/backup-staging/tweets-201803281043
  3. Export the snapshot. This step uses the DistCp utility to back up the collection metadata as well as the corresponding index files. The destination directory must exist and be writable by the Solr superuser (solr by default).
    To export the snapshot to a remote cluster, run the following command:
    solrctl collection --export-snapshot <snapshotName> -s <sourceDir> -d <protocol>://<namenode>:<port>/<destDir>
    For example:
    • HDFS protocol:
      solrctl collection --export-snapshot tweets-201803281043 -s /path/to/backup-staging/tweets-201803281043 \
      -d hdfs://nn01.example.com:8020/path/to/backups
    • WebHDFS protocol:
      solrctl collection --export-snapshot tweets-201803281043 -s /path/to/backup-staging/tweets-201803281043 \
      -d webhdfs://nn01.example.com:20101/path/to/backups

    To export the snapshot to the local cluster, run the following command:

    solrctl collection --export-snapshot <snapshotName> -c <collectionName> -d <destDir>

    For example:

    solrctl collection --export-snapshot tweets-201803281043 -c tweets -d /path/to/backups/
  4. Delete the snapshot:
    solrctl collection --delete-snapshot <snapshotName> -c <collectionName>

    For example:

    solrctl collection --delete-snapshot tweets-201803281043 -c tweets