This is the documentation for Cloudera Manager 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Browsing and Managing Snapshots

Required Role:

For both HBase (CDH 4.2 or later or CDH 5) and HDFS (CDH 5 only) services, a Browse tab is available where you can view the HBase tables or HDFS directories associated with a service on your cluster. From here you can view the currently saved snapshots for your tables or files, and delete or restore them as appropriate.

Managing HBase Snapshots

From the HBase Browse tab you can:

  • View the HBase tables that you can snapshot.
  • Initiate immediate (unscheduled) snapshots of a table.
  • View the list of saved snapshots currently being maintained. These may include one-off immediate snapshots, as well as scheduled policy-based snapshots.
  • Delete a saved snapshot.
  • Restore from a saved snapshot.
  • Restore a table from a saved snapshot to a new table (Restore As).

Browsing HBase Tables

To browse the HBase tables to view snapshot activity:

  1. From the Clusters tab, select your HBase service.
  2. Go to the Browse tab.

Managing HBase Snapshots

To take a snapshot,
  1. Click a table.
  2. Click Take Snapshot.
  3. Specify the name of the snapshot, and click Take Snapshot.

To delete a snapshot, click and select Delete.

To restore a snapshot, click and select Restore.

To restore a snapshot to a new table, select Restore As from the menu associated with the snapshot, and provide a name for the new table.
  Warning: If you "Restore As" to an existing table (that is, specify a table name that already exists) the existing table will be overwritten.

Managing HDFS Directory Snapshots

From the HDFS Browse tab you can:

  • Designate HDFS directories to be "snapshottable" so snapshots can be created for those directories.
  • Initiate immediate (unscheduled) snapshots of a table.
  • View the list of saved snapshots currently being maintained. These may include one-off immediate snapshots, as well as scheduled policy-based snapshots.
  • Delete a saved snapshot.
  • Restore an HDFS directory or file from a saved snapshot.
  • Restore an HDFS directory or file from a saved snapshot to a new directory or file (Restore As)

Browsing HDFS Directories

To browse the HDFS directories to view snapshot activity:

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the Browse tab.
As you browse the directory structure of your HDFS, basic information about the directory you have selected is shown at the right (owner, group, and so on).

Enabling HDFS Snapshots

HDFS directories must be enabled for snapshots in order for snapshots to be created. You cannot specify a directory as part of a snapshot policy unless it has been enabled for snapshotting.

To enable a HDFS directory for snapshots:
  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the Browse tab.
  3. Verify the Snapshottable Path and click Enable Snapshots.
  4. When the command has finished, a Take Snapshot button appears. You may need to refresh the page to see the new state.
  Note: Once you enable snapshots for a directory, you cannot enable snapshots on any of its subdirectories. Snapshots can be taken only on directories that have snapshots enabled.

To disable snapshots for a directory that has snapshots enabled, use the Disable Snapshots from the drop-down menu button at the upper right. If there are existing snapshots of the directory, they must be deleted before snapshots can be disabled.

Managing HDFS Snapshots

If a directory has been enabled for snapshots:
  • The Take Snapshot button is present, enabling an immediate snapshot of the directory.
  • Any snapshots that have been taken are listed by the time at which they were taken, along with their names and a menu button.

To take a snapshot, click Take Snapshot, specify the name of the snapshot, and click Take Snapshot. The snapshot is added to the snapshot list.

To delete a snapshot, click and select Delete.

To restore a snapshot, click and select Restore.

For restoring HDFS data, if a MapReduce or YARN service is present in the cluster, then DistributedCopy (distcp) will be used to restore directories, increasing the speed of restoration. The restore popup for HDFS (under More Options) allows selection of either MapReduce or YARN as the MapReduce service. For files, or if a MapReduce or YARN service is not present, a normal copy will be performed. Use of distcp allows configuration of the following options for the snapshot restoration, similar to what is available when configuring a replication:

  • MapReduce Service - The MapReduce or YARN service to use.
  • Scheduler Pool - The scheduler pool to use.
  • Run as - The user that should run the job. By default this is hdfs. If you want to run the job as a different user, you can enter that here. If you are using Kerberos, you must provide a user name here, and it must be one with an ID greater than 1000. Verify that the user running the job has a home directory, /user/<username>, owned by username:supergroup in HDFS.
  • Log path - An alternative path for the logs.
  • Maximum map slots and Maximum bandwidth - Limits for the number of map slots and for bandwidth per mapper. The defaults are unlimited.
  • Abort on error - Whether to abort the job on an error (default is not to do so). This means that files copied up to that point will remain on the destination, but no additional files will be copied.
  • Skip Checksum Checks - Whether to skip checksum checks (the default is to perform them). If checked, checksum validation will not be performed.
  • Remove deleted files - Whether to remove deleted files from the target directory if they have been removed on the source. When this option is enabled, files deleted from the target directory are sent to trash if HDFS trash is enabled, or are deleted permanently if trash is not enabled. Further, with this option enabled, if files unrelated to the source exist in the target location, then those files will also be deleted.
  • Preserve - Whether to preserve the block size, replication count, and permissions as they exist on the source file system, or to use the settings as configured on the target file system. The default is to preserve these settings as on the source.
      Note: To preserve permissions, you must be running as a superuser. You can use the "Run as" option to ensure that is the case.
Page generated September 3, 2015.