-
Big DataDistributedSystem/HadoopEcyosystem 2019. 9. 25. 13:35
1. Overview
around 90% of the world's data was created in the last two years alone. Moreover, 80% of the data is unstructured or available in widely varying structures such as images, line streaming records, videos, sensor records, GPS tracking details, which are difficult to analyze. Traditional systems are useful in working with structured data(limited as well), but they can't manage such a large amount of unstructured data. To make more smart and calculative decisions in whatever field we are working on such as preventing fraud activities in advance.
When you see Big Data, you cannot collect all data in a single machine. You must save it into multiple computers. And when you require to run a query, you cannot aggregate data into a single place due to high I/O cost. So what MapReduce algorithm does. it works on your query into all nodes individually where data is present, and then aggregate the final result and return to you.
It brings two significant improvements.
- Very low I/O cost because data movement is minimal
- Less time because your job parallel ran into multiple machines into smaller data sets.
2. Description
- Volume
- A slice of bigger pie of Big data
- Variety
- 90% of data produced is Unstructured arriving in all shapes and forms from Geospatial data, tweets, photos, and videos
- Velocity
- Each minute of every day, 200 hours of video on Youtube, send 300,000 tweets and carry over 200 million emails.
- Veracity
- Uncertainty of the data available to marketers
3. References
'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글
Hadoop (0) 2020.03.09 MapReduce (0) 2019.09.25 Difference between Hadoop and Spark (0) 2019.09.25 Hadoop Yet Another Resource Negotiator(Yarn) (0) 2019.09.14 Hadoop Distributed File System(HDFS) (0) 2019.09.08