Managing Cloudera SearchPDF version

Cloudera Search Backup and Restore Command Reference

Use the following commands to create snapshots, back up, and restore Solr collections.

Command: solrctl collection --create-snapshot <snapshotName> -c <collectionName>

Description: Creates a named snapshot for the specified collection.

Command: solrctl collection --delete-snapshot <snapshotName> -c <collectionName>

Description: Deletes the specified snapshot for the specified collection.

Command: solrctl collection --describe-snapshot <snapshotName> -c <collectionName>

Description: Provides detailed information about a snapshot for the specified collection.

Command: solrctl collection --list-snapshots <collectionName>

Description: Lists all snapshots for the specified collection.

Command: solrctl collection --prepare-snapshot-export <snapshotName> -c <collectionName> -d <destDir>

Description: Prepares the snapshot for export to a remote cluster. If you are exporting the snapshot to the local cluster, you do not need to run this command. This command generates collection metadata as well as information about the Lucene index files corresponding to the snapshot.

The destination HDFS directory path (specified by the -d option) must exist on the local cluster before you run this command. Make sure that the Solr superuser (solr by default) has permission to write to this directory.

If you are running the snapshot export command on a remote cluster, specify the HDFS protocol (such as WebHDFS or HFTP) to be used for accessing the Lucene index files corresponding to the snapshot on the source cluster. This configuration is driven by the -p option which expects a fully qualified URI for the root filesystem on the source cluster, for example webhdfs://namenode.example.com:20101/.

Command: solrctl collection --export-snapshot <snapshotName> -c <collectionName> -d <destDir>

Description: Creates a backup copy of the Solr collection metadata as well as the associated Lucene index files at the specified location. The -d configuration option specifies the directory path where this backup copy is be created. This directory must exist before exporting the snapshot, and the Solr superuser must be able to write to it.

Command: solrctl collection --export-snapshot <snapshotName> -s <sourceDir> -d <destDir>

Description: Creates a backup copy of the Solr collection snapshot, which includes collection metadata as well as Lucene index files at the specified location. The -d configuration option specifies the directory path where this backup copy is to be created.

Make sure that you prepare the snapshot for export before exporting it to a remote cluster.

You can run this command on either the source or destination cluster, depending on your environment and the DistCp utility requirements. If the destination cluster does not have the solrctl utility, you must run the command on the source cluster. The exported snapshot state can then be copied using standard tools, such as DistCp.

The source and destination directory paths (specified by the -s and -d options, respectively) must be specified relative to the cluster from which you are running the command. Directories on the local cluster are formatted as /path/to/dir, and directories on the remote cluster are formatted as <protocol>://<namenode>:<port>/path/to/dir. For example:

  • Local path: /solr-backup/tweets-2016-10-19
  • Remote HDFS path: hdfs://nn01.example.com:8020/solr-backup/tweets-2016-10-19
  • Remote WebHDFS path: webhdfs://nn01.example.com:20101/solr-backup/tweets-2016-10-19

The source directory (specified by the -s option) is the directory containing the output of the solrctl collection --prepare-snapshot-export command. The destination directory (specified by the -d option) must exit on the destination cluster before running this command.

If your cluster is secured (Kerberos-enabled), initialize your Kerberos credentials by using kinit before executing this command.

Command: solrctl collection --restore <restoreCollectionName> -l <backupLocation> -b <snapshotName> -i <requestId>

Description: Restores the state of an earlier created backup as a new Solr collection. Run this command on the cluster on which you want to restore the backup.

The -l configuration option specifies the local HDFS directory where the backup is stored. If the backup is stored on a remote cluster, you must copy it to the local cluster before restoring it. The Solr superuser (solr by default) must have permission to read from this directory.

The -b configuration option specifies the name of the backup to be restored.

Because the restore operation can take a long time to complete depending on the size of the exported snapshot, it is run asynchronously. The -i configuration parameter specifies a unique identifier for tracking operation. For more information, see Check the status of an operation.

The optional -a configuration option enables the autoAddReplicas feature for the new Solr collection.

The optional -c configuration option specifies the configName for the new Solr collection. If this option is not specified, the configName of the original collection at the time of backup is used. If the specified configName does not exist, the restore operation creates a new configuration from the backup.

The optional -r configuration option specifies the replication factor for the new Solr collection. If this option is not specified, the replication factor of the original collection at the time of backup is used.

The optional -m configuration option specifies the maximum number of replicas (maxShardsPerNode) to create on each Solr Server. If this option is not specified, the maxShardsPerNode configuration of the original collection at the time of backup is used.

If your cluster is secured (Kerberos-enabled), initialize your Kerberos credentials using kinit before running this command.

Command: solrctl collection --request-status <requestId>

Description: Displays the status of the specified operation. The status can be one of the following:

  • running: The restore operation is running.
  • completed: The restore operation is complete.
  • failed: The restore operation failed.
  • notfound: The specified requestID is not found.

If your cluster is secured (Kerberos-enabled), initialize your Kerberos credentials (using kinit) before running this command.