HDFS
HDFS (Hadoop Distributed File System) is the storage layer of the Hadoop cluster which stores the data. It is a distributed filesystem and it is very important for a Hadoop admin to know how to configure and manager HDFS inside out. For the purpose of the exam, we will see few of the basic commands to administer HDFS. This includes creating directories, managing owner & permissions, loading data into HDFS and copy data from HDFS into local filesystem.
Navigating HDFS
There are several ways to navigate through HDFS. I have listed out the most commonly used ways.
1. Using the NameNode Web UI
You can view the files on HDFS using the NameNode web UI which is located at – http://[namenode]:50070. The web UI dashboard looks like below.
You can go to utilities > Browser the filesystem and browse through all the directories.
You can not upload anything from the Namenode UI to the HDFS filesystem, but you can download files to your local system using the UI.
2. Using Ambari
The Swiss army knife – ambari can be used to view files, download them, upload new files, create new directories etc. Use the ambari server “files view” to go to the HDFS filesystem view.
3. Using the command line
HDFS provides us a way to manipulate the data stored in it with a filesystem shell using the command “hdfs dfs“. Few of the examples of using the command are demonstrated below.
1. To list the contents of a directory:
[hdfs@dn1 ~]$ hdfs dfs -ls /user Found 2 items drwxrwx--- - ambari-qa hdfs 0 2018-07-15 12:44 /user/ambari-qa drwxr-xr-x - hbase hdfs 0 2018-07-16 09:26 /user/hbase
2. Creating a new directory:
[hdfs@dn1 ~]$ hdfs dfs -mkdir /test
[hdfs@dn1 ~]$ hdfs dfs -ls /
Found 9 items
drwxrwxrwx - yarn hadoop 0 2018-07-15 12:43 /app-logs
drwxr-xr-x - hdfs hdfs 0 2018-07-16 09:26 /apps
drwxr-xr-x - yarn hadoop 0 2018-07-15 12:40 /ats
drwxr-xr-x - hdfs hdfs 0 2018-07-15 12:40 /hdp
drwxr-xr-x - mapred hdfs 0 2018-07-15 12:40 /mapred
drwxrwxrwx - mapred hadoop 0 2018-07-15 12:41 /mr-history
drwxr-xr-x - hdfs hdfs 0 2018-07-16 20:23 /test
3. Copy files within HDFS:
$ hdfs dfs -cp file1 file2
Creating a User and configuring home directory for a user in HDFS
I wanted to give you all a brief on what are the options available when it comes to working with HDFS as a filesystem. In the exam, you will be only asked to create a user and configure the user directory into HDFS and move some data to and from HDFS.
1. Create a local user
First, create a local user which by default will have a home directory into the local filesystem. Login as root on any of the datanodes and use the Linux command “useradd” to create a new user test.
# useradd test
Verify the user creation and also view the home directory of the user.
# id test uid=1009(test) gid=1009(test) groups=1009(test)
# su - test $ pwd /home/test
2. Create home directory for user into HDFS
Create a user home directory in HDFS using the “hdfs dfs” command. You have to be the “hdfs” user to do so.
$ hdfs dfs -mkdir /user/test
$ hdfs dfs -ls /user
Found 3 items
drwxrwx--- - ambari-qa hdfs 0 2018-07-15 12:44 /user/ambari-qa
drwxr-xr-x - hbase hdfs 0 2018-07-16 09:26 /user/hbase
drwxr-xr-x - hdfs hdfs 0 2018-07-16 20:38 /user/test
3. Change the ownership of the home directory in HDFS
The default ownership of the directories you create in HDFS will be hdfs:hdfs. For the new “test” user to use this directory, we need to first change the ownership to test:test.
$ hdfs dfs -chown test:test /user/test
Verify the new permissions again:
$ hdfs dfs -ls /user
Found 3 items
drwxrwx--- - ambari-qa hdfs 0 2018-07-15 12:44 /user/ambari-qa
drwxr-xr-x - hbase hdfs 0 2018-07-16 09:26 /user/hbase
drwxr-xr-x - test test 0 2018-07-16 20:38 /user/test
You can also use the seprate command to change user ownership and group ownership as shown below:
$ hdfs dfs -chown test /user/test $ hdfs dfs -chgrp test /user/test
4. Change the permissions of the home directory
Similar to changing ownership, you can change the permissions of the home directory of any file in HDFS for that matter. To view the current permissions:
$ hdfs dfs -ls /user
Found 3 items
drwxrwx--- - ambari-qa hdfs 0 2018-07-15 12:44 /user/ambari-qa
drwxr-xr-x - hbase hdfs 0 2018-07-16 09:26 /user/hbase
drwxr-xr-x - test test 0 2018-07-16 20:38 /user/test
Let’s try changing the permission of the directory “/user/test” to “660”.
$ hdfs dfs -chmod 660 /user/test
Verify the new permissions again:
$ hdfs dfs -ls /user
Found 3 items
drwxrwx--- - ambari-qa hdfs 0 2018-07-15 12:44 /user/ambari-qa
drwxr-xr-x - hbase hdfs 0 2018-07-16 09:26 /user/hbase
drw-rw---- - test test 0 2018-07-16 20:38 /user/test
Copying files to and from HDFS filesystem
To verify if we have performed all the steps properly, we can create a file locally and upload it to HDFS and vice versa. Let create a file locally first using the hdfs user.
$ touch /home/hdfs/test_file
Now, copy this file to the HDFS home directory of user hdfs. If you want to copy a file from any other location, make sure the parent directory has the permission of 755.
$ hdfs dfs -put /home/hdfs/test_file /user/test/
Verify:
$ hdfs dfs -ls /user/test Found 1 items -rw-r--r-- 2 hdfs test 0 2018-07-16 22:31 /user/test/test_file
Let’s now copy the same file from HDFS to local filesystem.
$ hdfs dfs -get /user/test/test_file /tmp/
Verify:
$ ls -l /tmp/test_file -rw-r--r--. 1 hdfs hadoop 0 Jul 16 22:38 /tmp/test_file
The command “hdfs dfs -get” and “hdfs dfs -CopyToLocal” are equivalent and can be used interchangeably.