We mentioned earlier that HDFS replication alone is not a suitable backup strategy. In the Hadoop 2 filesystem, snapshots have been added, which brings another level of data protection to HDFS. As changes to the filesystem are made, any change that would affect the snapshot is treated specially. For example, if a file that exists in the snapshot is deleted then, even though it will be removed from the current state of the filesystem, its metadata will remain in the snapshot, and the blocks associated with its data will remain on the filesystem though not accessible through any view of the system other than the snapshot.
We can recover a snapshot in HDFS to rollback to the desired system state in case of a data loss or corruption. As a part of the exam objective, we will create a snapshot and try to perform a recovery of the snapshot in this post.
1. Create a snapshot
1. Let’s first cerate a snapshot on a snapshottable directory. If the directory is not snapshottable, you can allow snapshot using the command:
$ hdfs dfsadmin -allowSnapshot /user/test Allowing snapshot on test succeeded
2. Create a snapshot of the directory “/user/test” with snapshot_latest as the name of the snapshot.
$ hdfs dfsadmin -createSnapshot /user/test snapshot_latest Created snapshot /user/test/.snapshot/snapshot_latest
3. View the snapshot in the .snapshot directory.
$ hdfs dfs -ls /user/test/.snapshot Found 1 items drwxr-xr-x - hdfs hdfs 0 2018-07-21 10:16 /user/test/.snapshot/snapshot_latest
2. Delete a file
Now, delete any file from the /user/test directory in HDFS.
$ hdfs dfs -ls /user/test Found 2 items -rw-r--r-- 3 hdfs hdfs 27 2018-07-21 10:34 /user/test/another_test -rw-r--r-- 3 hdfs hdfs 21 2018-07-21 10:10 /user/test/test_file
$ hdfs dfs -rm /user/test/test_file 18/07/21 11:06:40 INFO fs.TrashPolicyDefault: Moved: 'hdfs://geeklab/user/test/test_file' to trash at: hdfs://geeklab/user/hdfs/.Trash/Current/user/test/test_file
Verif that the file is not present.
$ hdfs dfs -ls /user/test/test_file ls: `/user/test/test_file': No such file or directory
3. Recover the snapshot
1. You can restore the delete file from the /user/test/.snapshot directory which still has the copy of the test_file present.
$ hdfs dfs -ls /user/test/.snapshot/snapshot_latest Found 1 items -rw-r--r-- 3 hdfs hdfs 21 2018-07-21 10:10 /user/test/.snapshot/snapshot_latest/test_file
$ hdfs dfs -cat /user/test/.snapshot/snapshot_latest/test_file This is a test file.
2. Lets copy the removed file from snapshot directory to the original location of the file.
$ hdfs dfs -cp /user/test/.snapshot/snapshot_latest/test_file /user/test/
Verify:
$ hdfs dfs -ls /user/test/test_file -rw-r--r-- 3 hdfs hdfs 21 2018-07-21 11:22 /user/test/test_file