Configuring Fault Tolerance
Also available as:
PDF
loading table of contents...

Options to determine differences between contents of snapshots

Run the hdfs snapshotDiff command for a report that lists the difference between the contents of two snapshots. Run the distcp diff command to determine the difference between contents of specified source and target snapshots, and use the command with the -update option to move the difference to a specified target directory.

Generating a report listing the difference between contents of two snapshots

Using the hdfs snapshotDiff between two snapshots on a specified directory path provides the list of changes to the directory. Consider the following example:

hdfs snapshotDiff /data/dir1 snap1 snap2
M	.
-	./file1.csv
R	./file2.txt -> ./fileold.txt
+	./filenew.txt
This example shows the following changes to the directory /data/dir1 after the creation of snap1 and before the creation of snap2:
Statement Explanation
M . The directory /data/dir1 is modified.
- ./file1.csv The file file1.csv is deleted.
R ./file2.txt -> ./fileold.txt The file file2.txt is renamed to fileold.txt.
+ ./filenew.txt The file filenew.txt is added to the directory /data/dir1.

Moving the differences between the contents of two snapshots to a specified directory

Using the distcp diff command with the -update option on snapshots enables you to determine the difference between the contents of two snapshots and move the difference to a specified target directory. Consider the following example:

hadoop distcp -diff snap_old snap_new -update /data/source_dir /data/target_dir
The command in this example determines the changes between the snapshots snap_old and snap_new present in the source_dir directory, and updates the target_dirdirectory with the changes.
The following conditions must be satisfied for the content changes to be moved to /data/target_dir:
  • Both /data/source_dir and /data/target_dir are distributed file system paths.
  • The snapshots snap_old and snap_new are created for /data/source_dir such that snap_old is older than snap_new.
  • The /data/target_dir path also contains snap_old. In addition, no changes are made to /data/target_dir after the creation of snap_old.