분류 전체보기
-
Boxing, unboxing, and autoboxingStaticPL/JAVA 2019. 9. 27. 11:20
1. Overview Let's clarify what boxing, unboxing, and autoboxing are and Why it needs and how to work. 2. Description 2.1 Definition Name Definition Invoking cases Autoboxing Converting a primitive value into an object of the corresponding wrapper class Passed as a parameter to a method that expects an object of the corresponding wrapper class Assigned to a variable of the corresponding wrapper c..
-
Big DataDistributedSystem/HadoopEcyosystem 2019. 9. 25. 13:35
1. Overview around 90% of the world's data was created in the last two years alone. Moreover, 80% of the data is unstructured or available in widely varying structures such as images, line streaming records, videos, sensor records, GPS tracking details, which are difficult to analyze. Traditional systems are useful in working with structured data(limited as well), but they can't manage such a la..
-
MapReduce Vs Spark RDDDistributedSystem/Spark 2019. 9. 25. 08:16
1. Overview MapReduce is widely adopted for processing and generating large datasets with a parallel, distributed algorithm on a cluster. It allows users to write parallel computations, using a set of high-level operators, without having to worry about work distribution and fault tolerance. But Data sharing is slow in MapReduce due to replication, serialization, and disk IO. Most of the Hadoop a..
-
Difference between Deep Learning and Shallow learningMLAI/DeepLearning 2019. 9. 25. 07:27
1. Overview Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on artificial neural networks. Learning can be supervised, semi-supervised or unsupervised. Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been ap..
-
RDD Lineage and Logical Execution PlanDistributedSystem/Spark 2019. 9. 25. 05:37
1. Overview RDD Lineage (aka RDD operator graph or RDD dependency graph) is a graph of all the parent RDDs of an RDD. It is built as a result of applying transformations to the RDD and creates a logical execution plan. Logical Execution Plan starts with the earliest RDDs (those with no dependencies on other RDDs or reference cached data) and ends with the RDD that produces the result of the acti..
-
MapReduceDistributedSystem/HadoopEcyosystem 2019. 9. 25. 05:08
1. Overview a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. The major advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes. Under the MapReduce model, the data processing primitives are called mappers and reducers. Decomposing a data process..
-
Difference between RDD and DSMDistributedSystem/Spark 2019. 9. 25. 04:30
1. Overview The RDD (resilient distributed DataSet) elastic distributed data set is the core data structure of spark. DSM (distributed shared memory) is a common memory data abstraction. In DSM, applications can read and write to any location in the global address space. The main difference between RDD and DSM is that not only can the RDD be created by bulk conversion (i.e. "write"), but it can ..
-
Difference between Hadoop and SparkDistributedSystem/HadoopEcyosystem 2019. 9. 25. 04:26
1. Overview Clarify the difference between Hadoop and Spark 2. Description Difference between Hadoop and Spark Features Hadoop Spark Data processing Only for batch processing Batch processing as well as real-time processing Processing speed Slower than Spark cause of I/O disk latency 100x faster in memory and 10x faster while running on disk Category Data processing engine Data analytics engine ..