ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Hadoop Yet Another Resource Negotiator(Yarn)
    DistributedSystem/HadoopEcyosystem 2019. 9. 14. 16:28

    1. Overview

    A platform that is responsible for managing computing resources in clusters and using them for scheduling users' applications. Yarn allows different data processing engines like graph processing, interactive processing, stream processing as well as batch processing to run and process data stored in HDFS(Hadoop Distributed File System). Apart from resource management, Yarn also does Job Scheduling. Yarn extends the power of Hadoop to other evolving technologies, so they can take the advantages of HDFS(most reliable and popular storage system) and economic cluster. 

    Apache yarn is also a data operating system for Hadoop 2.x. This architecture of Hadoop 2.x provides a general-purpose data processing platform which is not just limited to the MapReduce. It allows running several different frameworks on the same hardware where Hadoop is deployed.

    2. Description

    2.1 Rules of YARN

    • Resource management
    • Application Management
    • Scheduler
    • Node Manager
    • Application Master
    • Responsible for managing and monitoring workloads
    • Allows multiple data processing engines such as real-time streaming and batch processing to handle data stored on a single platform

    2.2 Features

    Features Description
    Flexibility Enables other purpose-built data processing models beyond MapReduce(batch), such as interactive and streaming. Due to this feature of YARN, other applications can also be run with MapReduce programs in Hadoop2
    Efficiency As many applications run on the same cluster, Hence, efficiency of Hadoop increases without much effect on the quality of service
    Shard
    • Provides a stable, reliable, secure foundation and shared operational services across multiple workloads. 
    • Additional programming models such as graph processing and iterative modeling are now possible for data processing

    2.3 Resource Manager Restart

    • Non-work-preserving RM restart
    • Work-preserving RM restart

    2.4 Yarn Resource Manager High Availability

    • Before to Hadoop v2.4, the master(RM) was the SPOF(single point of failure)
    • The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure
    • ResourceManager HA is realized through an Active/Standby architecture
      • One in the master is Active
      • Other Resource Managers are in Standby mode which is waiting to take over when anything happens to the Active
    • The trigger to transition-to-active comes from either the admin(through CLI) or through the integrated failover-controller when automatic failover is enabled
    • Manual transition and failover
    • Automatic failover

    3. References

    https://en.wikipedia.org/wiki/Apache_Hadoop

    https://en.wikipedia.org/wiki/Single_point_of_failure

    https://data-flair.training/blogs/hadoop-yarn-tutorial/

    https://data-flair.training/blogs/hadoop-ecosystem-components/

    'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

    Hadoop  (0) 2020.03.09
    Big Data  (0) 2019.09.25
    MapReduce  (0) 2019.09.25
    Difference between Hadoop and Spark  (0) 2019.09.25
    Hadoop Distributed File System(HDFS)  (0) 2019.09.08

    댓글

Designed by Tistory.