4. Introduction Big Data
Data Facts
Characteristics of Big Data
Type of Data
Big Data Tools
Hadoop
5. No single definition: here is from Wikipedia:
Big data is the term for a collection of data
sets so large and complex that it becomes
difficult to process using on-hand database
management tools or traditional data
processing applications.
Involves various tools, techniques and
frameworks.
8. Over 90% of all the data in the world was
created in the past 2 years.
Every 2 days we create as much information.
The total amount of data being captured and
stored by industry doubles every years.
Every minute we send 204 million emails,
Generate 1.8 million Facebook likes, send
278 thousand Tweets, and upload 200,000
photos to Facebook
Around 100 hours of video are uploaded to
every minute.
9. Big data (TB) cannot fit in a memory of single
computer
RDBMS fail to handle Big Data
Processing of Big data in a single computer
will take a lot of time.
Big data cannot be analyzed with a traditional
tools.
10. Characteristics of Big Data:5V’s
Volume – Data Quantity
Velocity – Data Speed
Variety - Data Types
Veracity – Data Quality and accuracy
Value - Data Value
Turning Big Data into Value: The latest
technology such as Distributed systems and
cloud computing together with the latest
software and analysis approaches allow us to
leverage all types of data to gain insights and
add value.
11.
12.
13. The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are
consuming data
New Model: all of us are generating data, and all of us are
consuming data
14. Processing Big Data
Unstructured - Video data, audio data,
( PDF)
Semi-structured - Many sources of big data
( XML)
Structured - Most traditional data sources
(Tables)
15. Sensors
Cc-cams
Social Network- FB..
Online Shopping
Airlines
Hospitality data etc.,
Big Data is needed – Increase of storage
capacities – Increase of processing power –
Availability of data (different data types).
16. Collecting
Organizing
Analyzing of Large
set of data to discover
pattern or other
useful information.
Organizing
Analyzing
Collecting
Representation
17.
18. Hadoop – Getting huge data, processed in
less time
Storing and processing huge amount of data
Hadoop is the Open source frame work
software, that is developed by ‘Apache’ to
support distributed processing of data.
Initially, Java Language was used to develop
Hadoop script, but today many other
languages are used for scripting Hadoop.
Hadoop is used to helps in data analytics
19.
20. Hadoop implements Google’s MapReduce,
using HDFS
MapReduce divides applications into many
small blocks of work.
HDFS creates multiple replicas of data
blocks for reliability, placing them on
compute nodes around the cluster.
MapReduce can then process the data
where it is located.
Hadoop ‘s target is to run on clusters of the
order of 10,000-nodes.
21. Hardware Requirements
Quad core processor- 64 bit
RAM – 8GB
Disk Free – 20 GB
Software Requirements
Windows 7+, MAC Osx10.10+,..
Several Opensource Software tools including
Apache Hadoop.