2. Introduction
Big data is a collection of large datasets that cannot be
processed using traditional computing techniques.
Big data involves the data produced by different devices
and applications
Big Data is a term used to describe a collection of
data that is huge in size and yet growing exponentially
with time.
3. Source
BIG DATA
Black Box Data
Social Media Data
Stock Exchange Data
Power Grid Data
Transport Data
Unstructured data
Search Engine Data
Structured data
4. Source
Black Box Data
Voices of the flight crew
Recordings of microphones and earphones
Performance information of the aircraft
Social Media Data
FaceBook Data
Twitter Data
Pintrest Data
5. Source
Purchased share by customer
Sold share by customer
Complete stock data
Model of vehicle
Capacity of vehicle
Distance related data
Stock Exchange Data
Transport Data
7. 3 V’s
BIG DATA
Variety Volume
The data is increasing
at a very fast rate. It is
estimated that the
volume of data will
double in every 2
years.
Data comes in all formats
that may be structured,
numeric in the traditional
database or the
unstructured text
documents, video, audio,
email, stock ticker data.
The amount of
data which we deal
with is of very large
size of Peta bytes.
Velocity
8. Technologies
BIG DATA Technologies
This include systems like
MongoDB that provide
operational capabilities for real-
time, interactive workloads
where data is primarily captured
and stored.
Operational
Big Data
Analytical
Big Data
These includes systems like
Massively Parallel Processing (MPP)
database systems and Map Reduce
that provide analytical capabilities
for retrospective and complex
analysis that may touch most or all
of the data.
10. Solution
BIG DATA
Map Reduce paradigm is
applied to data distributed
over network to find the
required output.
Hadoop is open source so
the cost is no more an
issue.
Pig, Hive can be used to
analyze the data.
This huge amount of data,
Hadoop uses HDFS
(Hadoop Distributed File
System).
Storage
Analyze Cost
Processing