-
RDD Lineage and Logical Execution PlanDistributedSystem/Spark 2019. 9. 25. 05:37
1. Overview
RDD Lineage (aka RDD operator graph or RDD dependency graph) is a graph of all the parent RDDs of an RDD. It is built as a result of applying transformations to the RDD and creates a logical execution plan.
Logical Execution Plan starts with the earliest RDDs (those with no dependencies on other RDDs or reference cached data) and ends with the RDD that produces the result of the action that has been called to execute
2. Description
2.1 RDD Lineage
val r00 = sc.parallelize(0 to 9) val r01 = sc.parallelize(0 to 90 by 10) val r10 = r00 cartesian r01 val r11 = r00.map(n => (n, n)) val r12 = r00 zip r01 val r13 = r01.keyBy(_ / 20) val r20 = Seq(r11, r12, r13).foldLeft(r10)(_ union _)
3. References
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-lineage.html
'DistributedSystem > Spark' 카테고리의 다른 글
MapReduce Vs Spark RDD (0) 2019.09.25 Difference between RDD and DSM (0) 2019.09.25 Apache Spark (0) 2019.09.20 Resilient Distributed Dataset(RDD) (0) 2019.09.08