What is Rack Awareness
To make sure that there is no single point of failure across the entire Hadoop infrastructure, and to ensure that the contention of resources is in a distributed manner, rack awareness plays an important role. Rack awareness is a concept in which Namenode is made aware of the layout of servers in a cluster, thus making intelligent decisions on block placement.
1. Change Rack Topology using commandline
In the exam, unless specified explicitly, do not use the command line method. The changes to Rack topology can be very easily configured using the ambari web UI. Follow the steps given below to change the rack topology using the command line.
For the purpose of this post we will have all the datanodes on serate racks as shown below.
dn1.localdomain /rack01 dn2.localdomain /rack02 dn3.localdomain /rack03
1. First, verify if you have the topology python script and the topology data file present in the /etc/hadoop/conf directory on the namenode.
[root@nn1 ~]# ls -lrt /etc/hadoop/conf/topology* -rw-r--r--. 1 hdfs hadoop 187 Jul 15 12:39 /etc/hadoop/conf/topology_mappings.data -rwxr-xr-x. 1 root root 2358 Jul 15 12:39 /etc/hadoop/conf/topology_script.py
2. Also, ensure if the following property is present in the configuration file /etc/hadoop/conf/core-site.xml which defines the location of the topology script:
3. View the current topology configuration in the file /etc/hadoop/conf/topology_mappings.data file.
# cat /etc/hadoop/conf/topology_mappings.data [network_topology] dn3.localdomain=/default-rack 192.168.1.5=/default-rack dn1.localdomain=/default-rack 192.168.1.3=/default-rack dn2.localdomain=/default-rack 192.168.1.4=/default-rack
4. Modify the topology_mappings.data file to have all the datanodes on different racks. The topology file should only be modified on the NameNode and the ResourceManager. In our case it is nn1.localdomain and dn2.localdomain.
# cat /etc/hadoop/conf/topology_mappings.data [network_topology] dn3.localdomain=/rack03 192.168.1.5=/rack03 dn1.localdomain=/rack01 192.168.1.3=/rack01 dn2.localdomain=/rack02 192.168.1.4=/rack02
5. Now restart all the services in the cluster using the ambari UI. This can be done on the hosts tab. Select all the hosts and start all the services.
This will take some time and you may need to manually restart few components if they do not start on their own.
6. After all the components are restarted verify the rack topology under hosts tab in ambari. Below are before and after rack locations of the datanodes.
2. Change Rack Topology using Ambari
Changing the rack topology using ambari is a piece of cake and should be used all the times unless specified otherwise.
1. Goto the hosts tab and select the datanode and use the actions drop-down with “set rack” option to define the new rack location for the datanode. You can also select multiple hosts and change the rack location if all the hosts are in the same rack.
The rack location must be set in the format /[location]. For example, we will set the rack location of datanode dn1.localdomain as /rack01.
Similarly, change the rack location of all the datanodes you want.
2. After setting the rack location for the desired datanodes, we have to restart all the components in HDP. To do this select all the hosts and from the actions drop-down, restart all the components.
3. After all the components are restarted verify the rack topology under hosts tab in ambari. Below are before and after rack locations of the datanodes.