Big Data

DistributedSystem/HadoopEcyosystem 2019. 9. 25. 13:35

1. Overview

around 90% of the world's data was created in the last two years alone. Moreover, 80% of the data is unstructured or available in widely varying structures such as images, line streaming records, videos, sensor records, GPS tracking details, which are difficult to analyze. Traditional systems are useful in working with structured data(limited as well), but they can't manage such a large amount of unstructured data. To make more smart and calculative decisions in whatever field we are working on such as preventing fraud activities in advance.

When you see Big Data, you cannot collect all data in a single machine. You must save it into multiple computers. And when you require to run a query, you cannot aggregate data into a single place due to high I/O cost. So what MapReduce algorithm does. it works on your query into all nodes individually where data is present, and then aggregate the final result and return to you.

It brings two significant improvements.

Very low I/O cost because data movement is minimal
Less time because your job parallel ran into multiple machines into smaller data sets.

2. Description

Volume
- A slice of bigger pie of Big data
Variety
- 90% of data produced is Unstructured arriving in all shapes and forms from Geospatial data, tweets, photos, and videos
Velocity
- Each minute of every day, 200 hours of video on Youtube, send 300,000 tweets and carry over 200 million emails.
Veracity
- Uncertainty of the data available to marketers

3. References

https://en.wikipedia.org/wiki/Big_data

https://towardsdatascience.com/a-brief-summary-of-apache-hadoop-a-solution-of-big-data-problem-and-hint-comes-from-google-95fd63b83623

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

Hadoop (0)	2020.03.09
MapReduce (0)	2019.09.25
Difference between Hadoop and Spark (0)	2019.09.25
Hadoop Yet Another Resource Negotiator(Yarn) (0)	2019.09.14
Hadoop Distributed File System(HDFS) (0)	2019.09.08

ABOUT ME

Demyank's Tlog Demyank's Tlog

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'DistributedSystem > HadoopEcyosystem' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바