RDD
-
Difference between RDD and DSMDistributedSystem/Spark 2019. 9. 25. 04:30
1. Overview The RDD (resilient distributed DataSet) elastic distributed data set is the core data structure of spark. DSM (distributed shared memory) is a common memory data abstraction. In DSM, applications can read and write to any location in the global address space. The main difference between RDD and DSM is that not only can the RDD be created by bulk conversion (i.e. "write"), but it can ..
-
Resilient Distributed Dataset(RDD)DistributedSystem/Spark 2019. 9. 8. 22:32
1. Overview Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes. There are two ways to create RDDs − parallelizi..