SlideShare una empresa de Scribd logo
1 de 35
Big Data and the
Art of Data Science
Andrew B. Gardner, PhD
www.linkedin.com/in/andywocky/
agardner@momentics.com
www.momentics.com
Big Data is Not New
Big Data Challenge
tion
e
old
8
1880 census – 50M people
The First Big Data Solution
• Hollerith Tabulating
System
• Punched cards – 80
variables
• Used for 1890 census
• 6 weeks instead of 7+
years
9
Hollerith Tabulation System
{age, number of insanes, …} 7 years  6 weeks
Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif
Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg
Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
Big Data Is More Than 3 Vs*
Volume Variety Velocity
*2001 (Meta) / 2012 (Gartner) Definition of Big Data
IDC Report 2011
8 billion TB in 2015
40 billion TB in 2020
90% of all data < 2 years
storage  transport
processing
relational, graph
time series, sensor,
audio, video, text,
geo, scientific, …
80% unstructured
facebook 500 TB/day
Large Hadron 35 GB/sec
twitter 300K tweets/min
real time  stream
Big Data Opportunities
“… big data market will grow from $3.2B (2010) to $16.9B (2015)…”
“… gains of 5-6% productivity and profitability …”
“… business volume will double every 1.2 years …”
“… required for companies to stay innovative and competitive …”
“… retail 60% increase in net margin attainable …”
“… manufacturing production costs decrease 50% …”
“… $300B annual savings in healthcare …”
IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
Big Data Successes
Walmart
• 10-15% online sales lift
• $1B incremental revenue
• Recommendations
• Engineered content
• 2012 Presidential Election • Fleet telematics save fuel
What’s Going On?
1: Growth of Data
Amount of data in the world…
2005
100 EB
2012
2800 EB
2013
8000 EB
1 EB = 1 Exabyte = 1 billion GB
… doubles every 2 years
2: Connectedness & Sources
More non-human
nodes online than
people
50B+ non-human
nodes online
The Internet of Things (IoT)
Source: Swan, M. Sensor Mania! The Internet of Things, Objective
Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3),
217-253.
social
mobile
web
enriched data
science
IoT
Data Sources
3: Demand
Increasing dependence on data.
4: Economics
Attention economy not information economy!
• Data is bountiful
• Storage is cheap
• Computing is cheap
• Analysis is cheap
• Talent is expensive
• Time is expensive
Big Data Disruption
• define schema
• pour in data
• analyze
Better Cycle Times and Better Questions Win!
 (few) well calculated
questions first
• collect data
• explore
• schema as needed
 data first then
exploratory decision
making
unknown unknowns = insight gold
OLD NEW
Rumsfeld Analytics
Things we
know
don’t know
we know
we don’t
know
we know
we don’t
know
Facts – could be wrong.
Questions – do reporting.
Intuition – quantify to improve.
Exploration– unfair advantages.
Goal: data discoveries = insights = game changers = unknown unknowns.
Data Alone is Just An Asset
• Depreciating
• Liability
• Useful lifetime
• Expense
Finished goods create value
from raw materials
data
$$ data product $$
Enter the Data Scientist
• mathematical
• developer
• data talented
• problem solver
• insight whisperer
• product savvy
Source: FICO Infographic
data + data scientist
$$ data product $$
A Brief History of Data Science
BC - The Greeks
1974 Peter Naur @ UoC
2001 William S. Cleveland @ CSU
2003 Journal of Data Science
2009 Jeff Hammerbacher @Facebook
2010 Hillary Mason & Chris Wiggins @ Dataists
2010 Mike Loukadis @ O'Reilly
2011 DJ Patil @ LinkedIn
Famous Definitions – New Blend
Conway’s “Data Science” Venn Diagram (2010)
Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
new skill blend:
one stop rock star
Famous Definitions – Skeptic
[… with a great salary]
Famous Definitions – Comparison
Many Flavors of Data Scientist
Alternatively, Data Roles × Skill Sets
Harlan Harris, et al.
datacommunitydc.org/ blog/ wp- content/ uploads/
Analyzing the Analyze
Harlan Harris, S
Marck Vaisman
O’Reilly, 2013
amazon.com/ dp
… from research
to development
to business-focused
Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013.
role
skill
2012-3 Survey
Universal Agreement: Scarcity
In 2018
Huge shortage of analytic
talent (140K+).
Gap of 1.5M managers that
can make decisions based on
data analysis
McKinsey Prediction
• Talent is the biggest resource
• There is a raging talent war
Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011).
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
The Data Scientist’s Craft
• Discover unknown unknowns in data
• Obtain predictive, actionable insight
• Communicate business data stories
• Build business decision confidence
• Create valuable Data Products
Valuable & Reusable Data Products
Image credit: Harlan Harris
Building Data Products
Objectives
Levers
Data
Models
What outcome am I trying to achieve?
What inputs can we control?
What data can we collect?
How do the levers impact the data?
Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
Data Product Aims
provide
increase
open
new
improve
data
Some Data Products
fitbit
flu tracker
amazon
traffic ads
SIRI
How Do Data Scientists Do It?
• Tools
• Workflow
• Creativity
Data Science Tools
• Java, R, Python
• Hadoop, HDFS, MapReduce, Spark, Storm
• HBase, Pig, Hive, Shark, Impala
• ETL, Webscrapers, Flume, Sqoop
• SQL, RDBMS, DW, OLAP
• Weka, RapidMiner, numpy, scipy, pandas
• D3.js, ggplot2, Wakari, Tableau, Flare, Shiny
• SPSS, Matlab, SAS
• NoSQL, MongoDB, Redis, ..
• MS-Excel
• Machine Learning
• ...
Data Science Workflow
Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to
the Factory: Building a Production Machine Learning Infrastructure.”
+ creative exploration
Data Science Creativity
TECHNOLOGY
(feasibility)
BUSINESS
(viability)
HUMAN VALUES
(usability, desirability)
1. Design thinking
2. Scientific method
3. Lots of ideas
4. Inspiration
5. Perspiration
Challenges for Data Scientists
• Stakeholder naivetee
– 2-3 days, right?
• Red tape
– No access allowed
• Terminology
– What’s a wonkulator?
• Real world data
– Messy, noisy, missing,
…
• Unknown need
– What’s the business goal?
• Stakeholder alignment
– CMO, CIO, Prod, DevOps
• Analysis distrust
– … but I don’t like that result
Some Practical Tips
Rapid Iteration
Implement Implement
Feedback
Visualize, Draw, Sketch, Share
Start Simple, Start Small Goal, But Not Perfection
Big Data Science & Sensemaking
Source: HP “Monetizing Big Data” Perspective.
A Final Word of Caution
big data
hypehope happy
time
expectations
cloud computing
2013 2018-2023
Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
Notable Quotes
Simple models and a lot of data trump more elaborate
models based on less data
- Peter Norvig
- W.E. Deming
In God we trust, all others bring data.
- Harvard Prof. Gary King
Big data is not about the data! The value in big data
[is in] the analytics.
Conclusion
• Data is an asset, talent is
a more valuable asset.
• Big data represents a
disruptive shift.
• Data science is the magic
enabler via Data Products.
• Better + faster
explorations &
questions win.
Andrew B. Gardner, PhD
http://linkd.in/1byADxC
agardner@momentics.com
www.momentics.com

Más contenido relacionado

La actualidad más candente

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016Richard Vidgen
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

La actualidad más candente (20)

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big data
Big dataBig data
Big data
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Destacado

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋The Japan DataScientist Society
 
データサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティストデータサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティストThe Japan DataScientist Society
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~nlab_utokyo
 

Destacado (9)

Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
EVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy FieldingEVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy Fielding
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋分析せよ!と言われて困っているあなたへの処方箋
分析せよ!と言われて困っているあなたへの処方箋
 
データサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティストデータサイエンスの全体像とデータサイエンティスト
データサイエンスの全体像とデータサイエンティスト
 
データサイエンスの全体像
データサイエンスの全体像データサイエンスの全体像
データサイエンスの全体像
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 

Similar a Big Data and the Art of Data Science

Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCTJ Stalcup
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The FutureBecky Wang
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Thinkful
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsNeo4j
 

Similar a Big Data and the Art of Data Science (20)

Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
DBMS
DBMSDBMS
DBMS
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The Future
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Using Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale AnalyticsUsing Graphs to Enable National-Scale Analytics
Using Graphs to Enable National-Scale Analytics
 

Último

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 

Último (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 

Big Data and the Art of Data Science

  • 1. Big Data and the Art of Data Science Andrew B. Gardner, PhD www.linkedin.com/in/andywocky/ agardner@momentics.com www.momentics.com
  • 2. Big Data is Not New Big Data Challenge tion e old 8 1880 census – 50M people The First Big Data Solution • Hollerith Tabulating System • Punched cards – 80 variables • Used for 1890 census • 6 weeks instead of 7+ years 9 Hollerith Tabulation System {age, number of insanes, …} 7 years  6 weeks Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
  • 3. Big Data Is More Than 3 Vs* Volume Variety Velocity *2001 (Meta) / 2012 (Gartner) Definition of Big Data IDC Report 2011 8 billion TB in 2015 40 billion TB in 2020 90% of all data < 2 years storage  transport processing relational, graph time series, sensor, audio, video, text, geo, scientific, … 80% unstructured facebook 500 TB/day Large Hadron 35 GB/sec twitter 300K tweets/min real time  stream
  • 4. Big Data Opportunities “… big data market will grow from $3.2B (2010) to $16.9B (2015)…” “… gains of 5-6% productivity and profitability …” “… business volume will double every 1.2 years …” “… required for companies to stay innovative and competitive …” “… retail 60% increase in net margin attainable …” “… manufacturing production costs decrease 50% …” “… $300B annual savings in healthcare …” IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
  • 5. Big Data Successes Walmart • 10-15% online sales lift • $1B incremental revenue • Recommendations • Engineered content • 2012 Presidential Election • Fleet telematics save fuel
  • 7. 1: Growth of Data Amount of data in the world… 2005 100 EB 2012 2800 EB 2013 8000 EB 1 EB = 1 Exabyte = 1 billion GB … doubles every 2 years
  • 8. 2: Connectedness & Sources More non-human nodes online than people 50B+ non-human nodes online The Internet of Things (IoT) Source: Swan, M. Sensor Mania! The Internet of Things, Objective Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3), 217-253. social mobile web enriched data science IoT Data Sources
  • 10. 4: Economics Attention economy not information economy! • Data is bountiful • Storage is cheap • Computing is cheap • Analysis is cheap • Talent is expensive • Time is expensive
  • 11. Big Data Disruption • define schema • pour in data • analyze Better Cycle Times and Better Questions Win!  (few) well calculated questions first • collect data • explore • schema as needed  data first then exploratory decision making unknown unknowns = insight gold OLD NEW
  • 12. Rumsfeld Analytics Things we know don’t know we know we don’t know we know we don’t know Facts – could be wrong. Questions – do reporting. Intuition – quantify to improve. Exploration– unfair advantages. Goal: data discoveries = insights = game changers = unknown unknowns.
  • 13. Data Alone is Just An Asset • Depreciating • Liability • Useful lifetime • Expense Finished goods create value from raw materials data $$ data product $$
  • 14. Enter the Data Scientist • mathematical • developer • data talented • problem solver • insight whisperer • product savvy Source: FICO Infographic data + data scientist $$ data product $$
  • 15. A Brief History of Data Science BC - The Greeks 1974 Peter Naur @ UoC 2001 William S. Cleveland @ CSU 2003 Journal of Data Science 2009 Jeff Hammerbacher @Facebook 2010 Hillary Mason & Chris Wiggins @ Dataists 2010 Mike Loukadis @ O'Reilly 2011 DJ Patil @ LinkedIn
  • 16. Famous Definitions – New Blend Conway’s “Data Science” Venn Diagram (2010) Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram new skill blend: one stop rock star
  • 17. Famous Definitions – Skeptic [… with a great salary]
  • 19. Many Flavors of Data Scientist Alternatively, Data Roles × Skill Sets Harlan Harris, et al. datacommunitydc.org/ blog/ wp- content/ uploads/ Analyzing the Analyze Harlan Harris, S Marck Vaisman O’Reilly, 2013 amazon.com/ dp … from research to development to business-focused Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013. role skill 2012-3 Survey
  • 20. Universal Agreement: Scarcity In 2018 Huge shortage of analytic talent (140K+). Gap of 1.5M managers that can make decisions based on data analysis McKinsey Prediction • Talent is the biggest resource • There is a raging talent war Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011). http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
  • 21. The Data Scientist’s Craft • Discover unknown unknowns in data • Obtain predictive, actionable insight • Communicate business data stories • Build business decision confidence • Create valuable Data Products
  • 22. Valuable & Reusable Data Products Image credit: Harlan Harris
  • 23. Building Data Products Objectives Levers Data Models What outcome am I trying to achieve? What inputs can we control? What data can we collect? How do the levers impact the data? Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
  • 25. Some Data Products fitbit flu tracker amazon traffic ads SIRI
  • 26. How Do Data Scientists Do It? • Tools • Workflow • Creativity
  • 27. Data Science Tools • Java, R, Python • Hadoop, HDFS, MapReduce, Spark, Storm • HBase, Pig, Hive, Shark, Impala • ETL, Webscrapers, Flume, Sqoop • SQL, RDBMS, DW, OLAP • Weka, RapidMiner, numpy, scipy, pandas • D3.js, ggplot2, Wakari, Tableau, Flare, Shiny • SPSS, Matlab, SAS • NoSQL, MongoDB, Redis, .. • MS-Excel • Machine Learning • ...
  • 28. Data Science Workflow Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to the Factory: Building a Production Machine Learning Infrastructure.” + creative exploration
  • 29. Data Science Creativity TECHNOLOGY (feasibility) BUSINESS (viability) HUMAN VALUES (usability, desirability) 1. Design thinking 2. Scientific method 3. Lots of ideas 4. Inspiration 5. Perspiration
  • 30. Challenges for Data Scientists • Stakeholder naivetee – 2-3 days, right? • Red tape – No access allowed • Terminology – What’s a wonkulator? • Real world data – Messy, noisy, missing, … • Unknown need – What’s the business goal? • Stakeholder alignment – CMO, CIO, Prod, DevOps • Analysis distrust – … but I don’t like that result
  • 31. Some Practical Tips Rapid Iteration Implement Implement Feedback Visualize, Draw, Sketch, Share Start Simple, Start Small Goal, But Not Perfection
  • 32. Big Data Science & Sensemaking Source: HP “Monetizing Big Data” Perspective.
  • 33. A Final Word of Caution big data hypehope happy time expectations cloud computing 2013 2018-2023 Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
  • 34. Notable Quotes Simple models and a lot of data trump more elaborate models based on less data - Peter Norvig - W.E. Deming In God we trust, all others bring data. - Harvard Prof. Gary King Big data is not about the data! The value in big data [is in] the analytics.
  • 35. Conclusion • Data is an asset, talent is a more valuable asset. • Big data represents a disruptive shift. • Data science is the magic enabler via Data Products. • Better + faster explorations & questions win. Andrew B. Gardner, PhD http://linkd.in/1byADxC agardner@momentics.com www.momentics.com

Notas del editor

  1. Herman HollerithObsolete1880 – 50,189,2091890 – 62,947,714
  2. ~ 15 mins via 10Gbps LAN to transfer 1TB~ 220 hrs for 1 PB =&gt; move the servers?
  3. Harlan Harris
  4. Data is the new currency of business.Understand customer use, behavior, and interests. Targeted products and marketing offers Understand customer experience across network, services, and social conversation.Network optimization Connect with OTT players, advertisers, and verticals. New business models