This document provides an overview of big data presented by Santisook Limpeeticharoenchot. It begins with an introduction to big data, covering definitions, characteristics involving volume, velocity, variety and veracity. Examples of big data sources like machine data, sensor data, and internet of things data are described. The use of big data analytics in industries like manufacturing, healthcare, and transportation is discussed. Finally, the document touches on data visualization, different types of analytics, and how companies can use big data to better understand customers and optimize business processes.
2. What will we cover?
• What is Big Data?
• Data, Analytic, Visualization.
• Big Data’s usecases and benefit
• Data Driven for Business Model Innovation.
• Q&A
5. What Is Big Data? The Google
Summary
Big data usually includes data sets with sizes beyond the ability of commonly used
software tools to capture, curate, manage, and process data within a tolerable elapsed time
Big data requires a set of techniques and technologies with new forms of integration to
reveal insights from datasets that are diverse, complex, and of a massive scale.
Gartner, and now much of the industry, continue to use this "3Vs" model for describing
big data.In 2012, Gartner updated its definition as follows: "Big data is high volume, high
velocity, and/or high variety.
13. Big Data in Gartner’s Hype Cycle
Source: Gartner, August 2014, www.gartner.com/newsroom/id/2819918
14. Big Data Is Heading for the “Trough of
Disillusionment”
Source: Gartner, August 2014, www.gartner.com/newsroom/id/2819918
https://whatitallboilsdownto.files.wordpress.com/2016/08/gartner-hype-2013-to-2016.jpg
16. refers to the vast amounts of data generated every second. We are not
talking Terabytes but Zettabytes or Brontobytes.
If we take all the data generated in the world between the beginning of
time and 2000, the same amount of data will soon be generated every
minute.
New big data tools use distributed systems so that we can store and
analyse data across databases that are dotted around anywhere in the
world.
Volume …
17. …refers to the speed at which new data is generated and the speed at
which data moves around. Just think of social media messages going viral in
seconds. Technology allows us now to analyse the data while it is being
generated (sometimes referred to as in-memory analytics), without ever
putting it into databases.
Velocity …
18. …refers to the different types of data we can now use. In the past we only
focused on structured data that neatly fitted into tables or relational
databases, such as financial data. In fact, 80% of the world’s data is
unstructured (text, images, video, voice, etc.) With big data technology we
can now analyse and bring together data of different types such as
messages, social media conversations, photos, sensor data, video or voice
recordings.
Variety …
19. Wearable devices have grown by 2x month over month
since October 2012.
Source: Mary Meeker’s Internet Trends, 2013
Photo: Intel Free Press
24. …refers to the messiness or trustworthiness of the data. With many forms
of big data quality and accuracy are less controllable (just think of Twitter
posts with hash tags, abbreviations, typos and colloquial speech as well as
the reliability and accuracy of content) but technology now allows us to
work with this type of data.
Veracity
25. Big Data Analytic
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Ref: Microsoft
26. What does Big Data Analytics require?
Data: data availability + storage + integration + data management
tools
+
Analytics: analytic formulas + statistical integrity + analytic
applications
+
Interpretation: business problem + domain expertise + visualization
+ decision-making
This typically requires a team of people with different skillsets.
27. ❖ Structured
• Most traditional
data sources
❖ Semi-structured
• Many sources of
big data
❖ Unstructured
• Video data, audio
data
27
Structure of Data
28. Machine Data
Machine data contains a definitive record of all the activity and
behavior of your customers, users, transactions, applications, servers,
networks and mobile devices. And it's more than just logs. It includes
configurations, data from APIs, message queues, change events, the
output of diagnostic commands, call detail records and sensor data
from industrial systems and more.
http://www.splunk.com/en_us/resources/machine-data.html
29. We are increasingly surrounded by sensors that collect and share
data. Take your smart phone, it contains a global positioning
sensor to track exactly where you are every second of the day, it
includes an accelometer to track the speed and direction at which
you are travelling. We now have sensors in many devices and
products.
Sensor Data
30. We now have smart TVs that are able to collect and process data, we have
smart watches, smart fridges, and smart alarms. The Internet of Things, or
Internet of Everything connects these devices so that e.g. the traffic
sensors on the road send data to your alarm clock which will wake you up
earlier than planned because the blocked road means you have to leave
earlier to make your 9am meeting…
.
Internet of Things Data
31. Simple activities like listening to music or reading a book are
now generating data. Digital music players and eBooks collect data
on our activities. Your smart phone collects data on how you use it
and your web browser collects information on what you are
searching for. Your credit card company collects data on where you
shop and your shop collects data on what you buy. It is hard to
imagine any activity that does not generate data.
Activity Data
32. Our conversations are now digitally recorded. It all started with
emails but nowadays most of our conversations leave a digital
trail. Just think of all the conversations we have on social media
sites like Facebook or Twitter. Even many of our phone
conversations are now digitally recorded.
Conversation Data
33. Just think about all the pictures we take on our smart phones or
digital cameras. We upload and share 100s of thousands of them on
social media sites every second. The increasing amounts of CCTV
cameras take video images and we up-load hundreds of hours of
video images to YouTube and other sites every minute .
Photo and Video Data
34. Big Data Is Not Only About “Big” Data
• “My analytics are becoming more difficult
because of the variety and types of data sources
(not just the volume)”
Source: Paradigm4 data scientist survey 2014
www.paradigm4.com/wp-content/uploads/2014/06/P4-data-scientist-survey-FINAL.pdf
36. The datafication of our world gives us unprecedented amounts
of data in terms of Volume, Velocity, Variety and Veracity. The
latest technology such as cloud computing and distributed
systems together with the latest software and analysis
approaches allow us to leverage all types of data to gain
insights and add value.
Turning (Big) Data into Value
41. Knowledge of Data Science (Team)
http://blog.revolutionanalytics.com/2010/10/the-data-science-venn-diagram.html
42. Types of Analytics you Could Use…
• ARMA
• CART
• CIR++
• Compression Nets
• Discrete Time
Survival Analysis
• D-Optimality
• Ensemble Model
• Gaussian Mixture Model
• Genetic Algorithm
• Gradient Boosted Trees
• Hierarchical Clustering
• Kalman Filter
• K-Means
• KNN
• Linear Regression
• Logistic/Lasso Regression
• Logistic Regression with Adaptive
Platform
• Monte Carlo Simulation
• Multinomial Regression
• Neural Networks
• Optimization: LP; IP; NLP
• Poisson Mixture Model
• Random Forests
• Restricted Boltzmann Machine
• Sensitivity Trees
• SVD
• Support Vector Machines
Slide: How to Create New Business Models
with Big Data & Analytics,Calpont Corporation
43. 43
Classification and regression trees / decision trees and Linear Regression are the
most popular predictive analytics techniques used.
69%
66%
61%
49%
36%
30%
30%
22%
21%
20%
15%
13%
25%
33%
29%
37%
42%
36%
35%
43%
43%
23%
41%
47%
6%
10%
14%
21%
34%
35%
34%
36%
57%
44%
40%
Classification and regression
trees / decision trees
Linear Regression
Logistic regression or other
discrete choice models
Association rules
K-nearest neighbors
Neural networks
Box Jenkins, Autoregressive
Moving Average (ARMA),…
Exponential smoothing /
double exponential…
Naïve Bayes
Support vector machines
Survival analysis
Monte Carlo Simulations
Frequently Occasionally Not at all
Source: Ventana Research Predictive Analytics Benchmark Research
Analytics that are Actually Used
53. Visual Best Practices
Visualization helps you move from numbers to pictures… Leverage the
power of human perception to make better sense of your business data.
Then
Now
56. I II III IV
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Nearly identical statistical properties?
Property Value
Mean of x in each case 9 (exact)
Variance of x in each case 11 (exact)
Mean of y in each case 7.50 (to 2 decimal places)
Variance of y in each case 4.122 or 4.127 (to 3 decimal places)
Correlation between x and y i
n each case
0.816 (to 3 decimal places)
Linear regression line in each
case
y = 3.00 + 0.500x (to 2 and 3 decimal
places, respectively)
57. But they are different in visualization
“Anscombe’s Quartet”
Source: Wikipedia
68. Objectives
• Identify new sales growth areas
• Gain customer insights to drive higher profitability
• Attract and retain top talent
Top performing companies:
• 5x more likely to be data driven
• Improve productivity by 5% and profitability by 6%
Opinion-Based Data-Driven
Experience
People
Time
Want
Can
8%
Speed to Opinion
(Weeks/Months)
• Lost time and
speed
• Costly decisions
• Lost talent
Conventional Self-Reliant
Speed to Insight
(Seconds/Minutes)
Want Can
80% • Improve profitability by 6%
• Increase productivity by 5%
• Attract and retain best talent
Success
• Wells Fargo
• Citibank
• BARCLAYS
Next Steps
• Discovery Session
• Self Reliant
Experience
• Review Meeting
Report
Factory
App
69. TABLEAU USAGE SCENARIOS (vs Traditional BI)
Conventional Approach
• People: Developer IT Driven
• Information: Push or Heavily Parameterized
• Interaction: Mostly Static & Linear
• Use Case: Monitoring & Reporting (“Destination”)
• Questions: Limited Set, Known, Predictable,
Historical
• Interface: Report, Grid
• Data: Bounded & Limited
Pixel Perfect
Paper
Reporting
SEC, FDA and
Financial
Reports
Fixed Format
Reports
Push Reports
or Bursting
Weekly Reports
(Limited
Changes)
Geographical &
Abstract Mapping
Sales Geo
Analysis
Territory Analysis
Store or MFG Floor
Layouts
and Logistics or
Supply Chain
Ad Hoc Query,
Data Visualization
& Analytics
Business Unit Led
Ad-hoc Analysis
Industry’s First Built-
In Best Practices via
ShowMe!
Embed and
Share in
SharePoint or
Portal
Natively Make
Available in
Portal or Your
Products
Prototypes, Rapid
Rollouts or
Changes
Analysis in Minutes
Changes in Minutes
or Hours and Done
Calculation Intensive
Interactive Visuals
Pricing Analytics
–
CPG, Retail
Revenue
Management
Corporate Audit
Tableau Approach
• People: Business User Driven
• Information: Pull or Truly Ad-Hoc
• Interaction: Dynamic & Iterative
• Use Case: Analysis (“Journey of Discovery”)
• Questions: Unlimited Set, Unknown, Unpredictable, Present
• Interface: Charts, Visualizations
• Data: Flexible Access
REPORT-DRIVEN
(Traditional BI)
ANALYSIS-DRIVEN
(TABLEAU)
KPI Analysis
Dashboards
Balanced
Scorecards
Executive
Dashboards
70. Analytics Free for Student
https://youtu.be/vh9v76e95GY
https://www.tableau.com/academic/students
https://www.tableau.com/academic/students#product-video
71. A Application Of Big Data analytics
Homeland
Security
Smarter
Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading
Analytics
Search
Quality
Books-Big Data by Viktor Mayer-Schonberger
Slide Prepared By Nasrin Irshad Hussain And Pranjal Saikia
M.Sc(IT) 2nd Sem Kaziranga University Assam
80. How is Big Data actually used? Example 1
Better understand and target customers:
To better understand and target customers, companies expand their traditional
data sets with social media data, browser, text analytics or sensor data to get a
more complete picture of their customers. The big objective, in many cases, is
to create predictive models. Using big data, Telecom companies can now better
predict customer churn; retailers can predict what products will sell, and car
insurance companies understand how well their customers actually drive.
81. How is Big Data actually used? Example 2
Understand and Optimize Business Processes:
Big data is also increasingly used to optimize business
processes. Retailers are able to optimize their stock based on
predictive models generated from social media data, web
search trends and weather forecasts. Another example is
supply chain or delivery route optimization using data from
geographic positioning and radio frequency identification
sensors.
82. How is Big Data actually used? Example 3
Improving Health:
The computing power of big data analytics enables us to find new cures and
better understand and predict disease patterns. We can use all the data from
smart watches and wearable devices to better understand links between
lifestyles and diseases. Big data analytics also allow us to monitor and predict
epidemics and disease outbreaks, simply by listening to what people are saying,
i.e. “Feeling rubbish today - in bed with a cold” or searching for on the Internet,
i.e. “cures for flu”.
83. How is Big Data actually used? Example 4
Improving Security and Law Enforcement:
Security services use big data analytics to foil terrorist plots and detect cyber
attacks. Police forces use big data tools to catch criminals and even predict
criminal activity and credit card companies use big data analytics it to detect
fraudulent transactions.
84. How is Big Data actually used? Example 5
Improving Sports Performance:
Most elite sports have now embraced big data analytics. Many use video analytics
to track the performance of every player in a football or baseball game, sensor
technology is built into sports equipment such as basket balls or golf clubs, and
many elite sports teams track athletes outside of the sporting environment –
using smart technology to track nutrition and sleep, as well as social media
conversations to monitor emotional wellbeing.
85. How is Big Data actually used? Example 6
Improving and Optimizing Cities and Countries:
Big data is used to improve many aspects of our cities and countries. For example, it
allows cities to optimize traffic flows based on real time traffic information as well as
social media and weather data. A number of cities are currently using big data analytics
with the aim of turning themselves into Smart Cities, where the transport
infrastructure and utility processes are all joined up. Where a bus would wait for a
delayed train and where traffic signals predict traffic volumes and operate to minimize
jams.
90. A New Venture business
unlocks intelligence and
gain benefit from data
91. Data is becoming the new raw material of business: an economic input
almost on a par with capital and labour. “Every day I wake up and ask, ‘how
can I flow data better, manage data better, analyse data better?” says Rollin
Ford, the CIO of Wal-Mart.
Source: Data, Data Everywhere, The Economist, February 25, 2010
There were 5 exabytes of information created between the dawn of
civilization through 2003, but that much information is now created every 2
days, and the pace is increasing
Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010
Why now?
92. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
93. Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)
119. Your Data Other Data
Taxonomy of Big Data Companies
Sell Data Compute + Sell
Data-driven
Product
Data Products
120. Your Data Other Data
Taxonomy of Big Data Companies
Sell Data Compute + Sell
Data-driven
Product
Data Products
121. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Data Products
122. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Data Products
123. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE
Data Products
124. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE
Data Products
125. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch
Data Products
126. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
Data Products
127. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
Data Products
128. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
DataSift, Yipit
Data Products
129. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
DataSift, Yipit Metamarkets,
PlaceIQ
Data Products
130. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
DataSift, Yipit Metamarkets,
PlaceIQ
RecordedFuture,
BillGuard
Data Products
131. Taxonomy of Big Data Companies
Your Data Other Data
Sell Data Compute + Sell
Data-driven
Product
Twitter, NYSE Bit.ly, Hunch Zynga, Amazon
DataSift, Yipit Metamarkets,
PlaceIQ
RecordedFuture,
BillGuard
Data Products
132. New Business Models
Aircraft engines is like selling razors. Profit; that comes later, from blades or spare
parts and servicing.
Measure and analyzing all real-time data from engines to spare parts mgmt,.
Rolls-Royce new model charge fee for every hour that an engine runs. They in
turn promises to maintain it and replace it if it breaks down.
www.Economist.com
http://professor-murmann.info/index.php/weblog/fullarticle_courses/290
133. New Business Models
12 years buying data on global wind flow patterns
Model how wind flows around the world
Now advise their customers on where to locate the wind turbines