Hadoop Yet Another Resource Negotiator(Yarn)

DistributedSystem/HadoopEcyosystem 2019. 9. 14. 16:28

1. Overview

A platform that is responsible for managing computing resources in clusters and using them for scheduling users' applications. Yarn allows different data processing engines like graph processing, interactive processing, stream processing as well as batch processing to run and process data stored in HDFS(Hadoop Distributed File System). Apart from resource management, Yarn also does Job Scheduling. Yarn extends the power of Hadoop to other evolving technologies, so they can take the advantages of HDFS(most reliable and popular storage system) and economic cluster.

Apache yarn is also a data operating system for Hadoop 2.x. This architecture of Hadoop 2.x provides a general-purpose data processing platform which is not just limited to the MapReduce. It allows running several different frameworks on the same hardware where Hadoop is deployed.

2. Description

2.1 Rules of YARN

Resource management
Application Management
Scheduler
Node Manager
Application Master
Responsible for managing and monitoring workloads
Allows multiple data processing engines such as real-time streaming and batch processing to handle data stored on a single platform

2.2 Features

Features	Description
Flexibility	Enables other purpose-built data processing models beyond MapReduce(batch), such as interactive and streaming. Due to this feature of YARN, other applications can also be run with MapReduce programs in Hadoop2
Efficiency	As many applications run on the same cluster, Hence, efficiency of Hadoop increases without much effect on the quality of service
Shard	Provides a stable, reliable, secure foundation and shared operational services across multiple workloads. Additional programming models such as graph processing and iterative modeling are now possible for data processing

2.3 Resource Manager Restart

Non-work-preserving RM restart
Work-preserving RM restart

2.4 Yarn Resource Manager High Availability

Before to Hadoop v2.4, the master(RM) was the SPOF(single point of failure)
The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure
ResourceManager HA is realized through an Active/Standby architecture
- One in the master is Active
- Other Resource Managers are in Standby mode which is waiting to take over when anything happens to the Active
The trigger to transition-to-active comes from either the admin(through CLI) or through the integrated failover-controller when automatic failover is enabled
Manual transition and failover
Automatic failover

3. References

https://en.wikipedia.org/wiki/Apache_Hadoop

https://en.wikipedia.org/wiki/Single_point_of_failure

https://data-flair.training/blogs/hadoop-yarn-tutorial/

https://data-flair.training/blogs/hadoop-ecosystem-components/

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

Hadoop (0)	2020.03.09
Big Data (0)	2019.09.25
MapReduce (0)	2019.09.25
Difference between Hadoop and Spark (0)	2019.09.25
Hadoop Distributed File System(HDFS) (0)	2019.09.08

ABOUT ME

Demyank's Tlog Demyank's Tlog

1. Overview

2. Description

2.1 Rules of YARN

2.2 Features

2.3 Resource Manager Restart

2.4 Yarn Resource Manager High Availability

3. References

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. Overview

2. Description

2.1 Rules of YARN

2.2 Features

2.3 Resource Manager Restart

2.4 Yarn Resource Manager High Availability

3. References

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바