Hadoop YARN architecture helps to knit the Hadoop storage unit which is the Hadoop distributed file system or HDFS with many processing tools. YARN stands for Yet Another Resource Negotiator. Read below to know what the YARN Apache is and what the YARN architecture looks like.
In version 1.0 of YARN Hadoop which is also known as the MapReducce version 1 or MRV1, the MapReduce performed the resource management and the processing functions. It had a job tracker as the single master which allocated resources, did the scheduling, and also monitored the jobs for processing. It also assigned the map and reduced the tasks on various subordinate processes which is known as the task tracker. The function of the task tracker was to report the progress periodically to the job tracker.
However, because only a single job tracker was used scalability issues arose.
The practical design of this limit reached a cluster of 40000 tasks and 50000 nodes that ran concurrently. Also, MRV made it difficult to carry out computational resources. The Hadoop process thus became limited only to the MapReduce processing paradigm.
It was to overcome these problems that the YARN was introduced in version 2.0 of Hadoop. YARN was introduced in the year 2012 by Hortonworks and Yahoo. The main idea of YARN was to relive the MapReduce which it did by taking over the job scheduling and resource management responsibility. YARN lets Hadoop get the ability to run the non-MapReduce jobs in the framework on Hadoop.
Let us now introduce the part of the YARN architecture which is the core component in Hadoop v2.0. YARN allows the various ways of data processing like interactive, graph, and stream processing. It also allows batch processing that runs and processes the stored data in the HDFS.
This lets YARN architecture to open up the Hadoop to various kinds of distributed application which is beyond the MapReduce. YARN also lets the users carry out operations using various tools. YARN caries out resource management, job scheduling and also carries out all the processing activities which it does by scheduling the tasks and allocating the resources.
Here are the components of the Hadoop YARN architecture
YARN is like the brain of Hadoop. Here we explain the different components of YARN.
The resource manager is the ultimate authority that allows the allocation of resources. When the processing request is received then it passes some part of the request to the node managers as per where the processing is to take place. It is basically the arbitrator of the cluster resources and it helps to decide how the available resources will be allocated for the applications competing with it. It optimizes the utilization of clusters by making sure that all the resources are used at all times. The resource manager works with two schedulers which are scheduler and application manager.
Node manager takes care of the individual nodes in the Hadoop cluster and it manages the workflow and jobs of the user in the node given. It is registered with the resource manager and sends the status of the health of the node. The main function of the node is to manage the application container that it has been assigned by the resource manager. It makes sure that it is up to date with the resource manager.
The application is a single job that gets submitted to the framework. Every such application comes with an application master that is unique and it is associated with a specific entity of framework. It is a process that coordinates the execution of the application in the cluster. It also works to manage the faults. The work of the application master is to negotiate on the resources which it does with the resource manager. It then works along with the node manager in order to execute and monitor the tasks.
The container is the physical resource collector which includes CPU cores and RAM all of which are on a single node. The YARN container is managed with a container launch context which is the CLC or the container life cycle.
The Hadoop ecosystem got revolutionized completely with YARN. This made it more efficient, flexible, and scalable. In the year 2013, Yahoo went out live with YARN which allowed the company to shrink the Hadoop cluster size from 40000 to 32000 nodes. It also caused a whopping increase in jobs.
If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional.
Fill in the details to know more
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Data Visualization Best Practices
March 23, 2023
What Are Distribution Plots in Python?
March 20, 2023
What Are DDL Commands in SQL?
March 10, 2023
Best TCS Data Analyst Interview Questions and Answers for 2023
March 7, 2023
Best Morgan Stanley Data Engineer Interview Questions
March 1, 2023
SAS Tutorial: An Interesting Overview In 2021
May 10, 2021
PyCharm Tutorial: A Detailed Guide In 7 Points
Cassandra Tutorial: An Ultimate Guide In 6 Points
Amazon’s DynamoDB Tutorial – A Simplified Guide For 2021
Puppet Tutorial For Beginners In 7 Easy Points
SSIS Tutorial: A Comprehensive Guide For 2021
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile