YARN Schedulers
The Hadoop YARN scheduler is responsible for assigning resources to the applications submitted by users. There are 3 types of schedulers in YARN.
- First in First out (FIFO) (Hadoop 1.x)
- Fair scheduler
- Capacity scheduler
First in First out (FIFO)
By default, YARN supports a First in First out (FIFO) scheduler, which executes jobs in the same order as they arrive using a queue of jobs. However, FIFO scheduling might not be the best option for large multi-user Hadoop deployments.
Fair scheduler
The Fair scheduler allows all jobs to receive an equal share of resources. The resources are assigned to newly submitted jobs as and when the resources become available until all submitted and running jobs have the same amount of resources.
Capacity scheduler
The Capacity scheduler allows a large cluster to be shared across multiple organizational entities while ensuring guaranteed capacity for each entity and that no single user or job holds all the resources. In order to achieve this, the Capacity scheduler defines queues and queue hierarchies, with each queue having a guaranteed capacity. The Capacity scheduler allows the jobs to use the excess resources (if any) from the other queues.
Enabling the Capacity Scheduler (Command Line)
1. To enable the capacity scheduler, make sure you have the following property set in the yarn configuration file /etc/hadoop/conf/yarn-site.xml on the ResourceManager Host:
# vi /etc/hadoop/conf/yarn-site.xml
Property: yarn.resourcemanager.scheduler.class
Value: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
2. Switch to the user “yarn” and run the below command which refreshes the current queues.
$ yarn rmadmin -refreshQueues 18/07/22 09:04:22 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 18/07/22 09:04:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
Enabling the Capacity Scheduler (With Ambari)
1. To enable Capacacity scheduler using ambari, goto services > YARN > Configs. Search for the property yarn.resourcemanager.scheduler.class in the filter box. As shown below, currently Fair Share scheduler is set as the default scheduler.
2. Modify the scheduler property to have the value org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler and click save to save the config.
3. Provide an appropriate description while saving the config.
4. We will have to restart the YARN service for the changes to take effect.
Verify
You can verify the scheduler after restarting the YARN service. Search for the property “yarn.resourcemanager.scheduler.class” in the filter box. As shown below the scheduler type is now Capacity Scheduler.
You can also verify the scheduler type in the yarn configuration file /etc/hadoop/conf/yarn-site.xml.
# cat /etc/hadoop/conf/yarn-site.xml