In a Hadoop cluster, if the RM goes offline for any reason, all the jobs on the cluster will fail. In production, there will be critical jobs that might be running for a long time and it does not make sense to start them again due to the failure of RM. HA for ResourceManager was introduced in Hadoop 2.4 and it supports both manual and automatic failover.
As the exam objective in HDPCA exam, we will see how to configure the ResourceManager HA using ambari in this post. Similar to Namenode HA discussed in the earlier recipes, ResourceManager HA also has only one active node at any given point of time. The failover is either initiated by an admin command or by using ZooKeeper for automatic failover.
Verify Current Configuration
Before enabling ResourceManager HA, let’s first verify the current configuration. Go to the Services > YARN. You would only see a single ResourceManager. In case of HA, you would see an extra Standby ResourceManager as well here.
Configuring ResourceManager HA using ambari wizard
To configure ResourceManager HA, goto Services > YARN. Click “Enable ResourceManager HA” under Service Actions drop-down.
1. Enable ResourceManager HA Wizard
The first page is informational and informs us that we need a cluster level downtime to perform the HA. Also once the HA mode is enabled you would have an active-standby setup with another standby ResourceManager.
2. Select Host
On the next page we will select the nn2 host as the additional ResourceManager.
3. Review
On the Review page, you can review the configuration. As we have selected nn2 as our additional ResourceManager, Ambari wizard will go ahead and install the required software on this node. The wizard will also take care of the properties to be modified in the configuration files.
4. Configure components
The wizard will continue installing and configuring the additional ResourceManager. The detailed steps involved are:
- Stop required services
- Install additional ResourceManager
- Reconfigure YARN
- Reconfigure HDFS
- Start all services
Once all the components are completed, click the complete button to proceed to the dashboard.
Verify
We can go to the Services > YARN page to verify the ResourceManager HA setup. As shown below, we can now see a Standby ResourceManager along with the active ResourceManager.
You may need to restart a few component post the HA configuration for ResourceManager. For example, in our setup, we have to restart the HDFS service on all 5 nodes.