Options to determine differences between contents of snapshots
Run the hdfs snapshotDiff command for a report that lists the
difference between the contents of two snapshots. Run the distcp diff
command to determine the difference between contents of specified source and target
snapshots, and use the command with the -update option to move the
difference to a specified target directory.
Generating a report listing the difference between contents of two snapshots
Using the hdfs snapshotDiff between two snapshots on a specified
directory path provides the list of changes to the directory. Consider the following
example:
hdfs snapshotDiff /data/dir1 snap1 snap2 M . - ./file1.csv R ./file2.txt -> ./fileold.txt + ./filenew.txt
/data/dir1
after the creation of snap1 and before the creation of
snap2:| Statement | Explanation |
|---|---|
M . |
The directory /data/dir1 is modified. |
- ./file1.csv |
The file file1.csv is deleted. |
R ./file2.txt -> ./fileold.txt |
The file file2.txt is renamed to
fileold.txt. |
+ ./filenew.txt |
The file filenew.txt is added to the directory
/data/dir1. |
Moving the differences between the contents of two snapshots to a specified directory
Using the distcp diff command with the -update
option on snapshots enables you to determine the difference between the contents of
two snapshots and move the difference to a specified target directory. Consider the
following example:
hadoop distcp -diff snap_old snap_new -update /data/source_dir
/data/target_dirsnap_old and snap_new present in the
source_dir directory, and updates the
target_dirdirectory with the changes./data/target_dir:- Both
/data/source_dirand/data/target_dirare distributed file system paths. - The snapshots
snap_oldandsnap_neware created for/data/source_dirsuch thatsnap_oldis older thansnap_new. - The
/data/target_dirpath also containssnap_old. In addition, no changes are made to/data/target_dirafter the creation ofsnap_old.
