Options to determine differences between contents of snapshots
Run the hdfs snapshotDiff
command for a report that lists the
difference between the contents of two snapshots. Run the distcp diff
command to determine the difference between contents of specified source and target
snapshots, and use the command with the -update
option to move the
difference to a specified target directory.
Generating a report listing the difference between contents of two snapshots
Using the hdfs snapshotDiff
between two snapshots on a specified
directory path provides the list of changes to the directory. Consider the following
example:
hdfs snapshotDiff /data/dir1 snap1 snap2 M . - ./file1.csv R ./file2.txt -> ./fileold.txt + ./filenew.txt
/data/dir1
after the creation of snap1
and before the creation of
snap2
:Statement | Explanation |
---|---|
M . |
The directory /data/dir1 is modified. |
- ./file1.csv |
The file file1.csv is deleted. |
R ./file2.txt -> ./fileold.txt |
The file file2.txt is renamed to
fileold.txt . |
+ ./filenew.txt |
The file filenew.txt is added to the directory
/data/dir1 . |
Moving the differences between the contents of two snapshots to a specified directory
Using the distcp diff
command with the -update
option on snapshots enables you to determine the difference between the contents of
two snapshots and move the difference to a specified target directory. Consider the
following example:
hadoop distcp -diff snap_old snap_new -update /data/source_dir
/data/target_dir
snap_old
and snap_new
present in the
source_dir
directory, and updates the
target_dir
directory with the changes./data/target_dir
:- Both
/data/source_dir
and/data/target_dir
are distributed file system paths. - The snapshots
snap_old
andsnap_new
are created for/data/source_dir
such thatsnap_old
is older thansnap_new
. - The
/data/target_dir
path also containssnap_old
. In addition, no changes are made to/data/target_dir
after the creation ofsnap_old
.