2. Outline
◦what is Big Data ?
◦where this Beg Data come from?
◦4v`s Analysis
◦When dealing with big Data?
◦EXAMPLE : Google
3. What is big data?
“Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone. This data comes from
everywhere: sensors used to gather climate information,
posts to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS signals to
name a few.
This data is “big data.”
4. Where Is This “Big Data” Coming From?
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?TBsof
dataevery
day
2+ billion
people on
the Web
by end
2011
30 billion RFID tags
today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions of
GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
5. Volume
of Tweets
create daily.
12+ terabytes
Variety
of different
types of data.
100’s
Value
With Big Data, We’ve Moved to 4 Vs Analytics
trade events
per second.
5+ million
Velocity
6. Volume (Scale)
Data Volume
◦ 44x increase from 2009 2020
◦ From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
6
Refers to the vast amounts of data generated every second. We are not talking
Terabytes but Petabytes . If we take all the data generated in the world between the
beginning of time and 2008, the same amount of data will soon be generated every
minute. This makes most data sets too large to store and analyze using traditional
database technology. New big data tools use distributed systems so that we can store
and analyse data across databases that are dotted around anywhere in the world.
7. Variety (Complexity)
7
To extract knowledge all these types of
data need to be linked together
Refers to the different types of data we can now use. In the past we only
focused on structured data that neatly fitted into tables or relational
databases, such as financial data.
In fact, 80% of the world’s data is unstructured (text, images, video,
voice, etc.) With big data technology we can now analyze and bring
together data of different types such as messages, social media
conversations, photos, sensor data, video or voice recordings.
8. Velocity (Speed)
Velocity :Refers to the speed at which new data is generated and the speed at
which data moves around. Just think of social media messages going viral in
seconds. Technology allows us now to analyze the data while it is being
generated (sometimes referred to as in-memory analytics), without ever
putting it into databases.
Examples
◦ E-Promotions: Based on your current location, your purchase history, what
you like send promotions right now for store next to you
◦ Healthcare monitoring: sensors monitoring your activities and body any
abnormal measurements require immediate reaction
8
9. Real-time/Fast Data
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
9
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
10. Value Then there is another V to take into account when looking at Big
Data: Value! Having access to big data is no good unless we can turn it
into value. Companies are starting to generate amazing value from their
big data.
We currently only see the beginnings of a transformation into a big data
economy. Any business that doesn’t seriously consider the implications
of Big Data runs the risk of being left behind.
Value
11. Big Data Exploration: Value & Diagram
11
File
Systems
Relational
Data
Content
Management
Email
CRM
Supply
Chain
ERP
RSS Feeds
Cloud
Custom
Sources
DataExplorer
Application/
Users
Find, Visualize & Understand
all big data to improve
business knowledge
• Greater efficiencies in business
processes
• New insights from combining and
analyzing data types in new
ways
• Develop new business models
with resulting increased market
presence and revenue
12. Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare
Telecom
Manufacturing
Traffic Control
Trading Analytics
Log Analysis
Search Quality
13. When dealing with Big Data is
hard
When the operations on data are complex:
◦ Eg. Simple counting is not a complex problem.
◦ Modeling and reasoning with data of different kinds can
get extremely complex
Good news with big-data:
◦ Often, because of the vast amount of data, modeling
techniques can get simpler (e.g., smart counting can
replace complex model-based analytics)…
◦ …as long as we deal with the scale.
14.
15.
16.
17.
18.
19.
20. Hadoopis an open-source software framework for storing and processing big data in a
distributed fashion on large clusters of commodity hardware.
Suitable for extremely large databases (billions of rows, millions of columns), distributed
across thousands of nodes.
21. Hadoop Distributed File System (HDFS) is a Java-based file system that provides
scalable and reliable data storage that is designed to large clusters of commodity
servers.
22.
23.
24. MapReduce is a programming model and an associated implementation for processing and generating
large data sets with a parallel, distributed algorithm on a cluster.
25.
26. We first wrote the data into HDFS, then created a table and loaded data from HDFS
files to HIVE table.