SlideShare una empresa de Scribd logo
1 de 84
Descargar para leer sin conexión
Data-Driven Disruption:
Lessons from Silicon Valley
Anand Rajaraman
The Rise of Data Driven Disruption
2
50-fold Growth from 2010 to 2020
3
2014: More
bits in the
digital universe
than stars in
the physical
universe
Sources of Data
• The world creates 1.7MB of data per minute per person
4The Digital Universe -- IDC Report, 2014
Data-Driven Applications
5
Data-Driven Applications
Talk outline
• The evolution of data-driven applications
• 5 generations
• Lessons and Opportunities
• From the intersection of startups, venture capital, and
research
• Key theme: Disruption vs Optimization
• Conclusion
6
THE EVOLUTION OF
DATA-DRIVEN APPS
7
Follow the Data!
• Value-creation has followed the most
valuable data sources available!
• 5 overlapping generations
8
Data driven apps: The First Generation
• All about leveraging private, structured data
assets for competitive advantage
• E.g., Sales, inventory, payroll, …
9
Data-driven apps: The Second
Generation
• Harnessing the power of public data
10
Data-Driven Apps: The Third Generation
• Leveraging the power of “semi-public”
Social + Mobile Data
• Personal data shared in a frictionless manner with
user’s consent
11
Third Generation Examples
12
Data-driven apps: The Fourth Generation
• Combining public, semi-public, and private
data
13
+
4G Example: Paysa
14
• Am I being compensated fairly?
• 2012 Stanford CS grad
• Java, C++, Ruby, and Machine Learning
• Software Eng II at Google
4G Example: Paysa
15
Salaries
35M+ salary
datapoints
Companies
500k+
companies
People
Professional
DNA of
15M tech
employees
Jobs
Millions of
job postings
updated daily
Local/National
Government
Databases
Partnerships
(e.g., Udacity)
Recruiters
Companies Web Crawl
Social Media
Private Public
The Fifth Generation: Just add AI!
16
• Companies generate massive amounts of
training data
• New class of proprietary data
The Fifth Generation
17
+
Fifth Generation Examples
18
Summary: Follow the Data!
19
LESSONS AND
OPPORTUNITIES
20
Lessons and Opportunities
1. Startup and Investment Landscape
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
21
Lessons and Opportunities
1. Startup and Investment Landscape
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
22
23
3 broad categories:
Infrastructure
Analytics
Intelligent Applications
Infrastructure
• Accessed primarily by developers
24
Analytics
• Data exploration and modeling for data
scientists and business people
25
Vertical Analytics: Cuberon
26
The “Why?” Question
• Why are signups
down this week?
• Why did this
marketing campaign
do so well?
• Why did this A/B test
not perform?
27
Consumer Behavior Analytics: Cuberon
28
Build data cube
Identify
anomalous
subcubes
Intelligent Applications
29Matt Turck, Jim Hao & FirstMark Capital
More Intelligent Applications…
30Matt Turck, Jim Hao & FirstMark Capital
Intelligent App Example: Descartes Labs
31Another example: Zillow
Trends and Takeaways
• Infrastructure is available and solid
• Major transition from Hadoop to Spark
• Investment focus on “Vertical” analytics
plays
• e.g., Cuberon, Ayasdi
• The Age of the Intelligent App has dawned
• Major opportunities and investment dollars flowing here!
• e.g., Troo.ly, Descartes Labs, DocsApp
32
Lessons and Opportunities
1. Startup and Investment Landscape
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
33
Data-driven Optimization
34EMC: Understanding Data Lakes
Data-driven Disruption
35
Beware the Hippo
HiPPO = Highest Paid Person’s Opinion
36
Why does disruption happen?
• Data scientist as advisor not decision maker
• Domain expertise and experience often win out over data
• Data-driven approach enables a completely
different business model
• E.g., A la carte streaming vs fixed number of channels
• Cannibalization concerns
• Fear of making mistakes
• Algorithms can make mistakes
• But algorithms can learn and improve much faster with data!
37
Why does disruption happen?
• Classic Innovator’s Dilemma with a turbo-
boost: data network effects
• Accelerates the pace of disruption
38
Disruption Example: Venture Capital
• Venture Capital has been an established
industry for several decades
• Process has not changed much since early days
• VC firms expect entrepreneurs to approach them with
pitches
• Some VC firms have tried using data
• Data scientists in advisory role
• Not partners who make investment decisions
• High concentration in Silicon Valley
• And a few other places…
39
Sets the stage for…
40
rocketship.vc
Venture Investing through Data Science
More Global Startups
41
Reduced costs to launch a startup
Large consolidating markets;
smartphone ubiquity
Emerging Market Opportunities
Untapped talent pools
Beyond Human Scale
42
2.1 Million “Startups”
115K need funding at any time
90% outside Silicon Valley
12.8 Million Companies
Why Data-Driven? Geography
43
0
10
20
30
40
50
60
70
80
90
100
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Count
Number	of	$B	companies	by	year
Silicon	Valley Outside	Silicon	Valley
The Company Model
44
Company
ModelTraction
Team
Market
Competition
Customer
Feedback
Business Model Innovation
• Proactively identify interesting companies and
reach out to them at the appropriate moment
45
South America
9%
East
Europe
11%
China
13%
India
7%Other
East Asia
11%
Other Europe
5%
Other North
America
7%
US SF
11%
US Other
22%
Unknown
4%
Optimize or Disrupt?
• Key question for every entrepreneur (and
researcher too!)
• Often difference between success and failure
• Hard to answer in general, but look out for
disruption cues
• Established, fragmented industry
• Slow to adopt latest technology trend
• Asset-heavy models
• Risk/reward tradeoff
• Disruption is much riskier but the rewards compensate
46
Lessons and Opportunities
1. Startup and Investment Landscape
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
47
Current view of Human-Machine
Collaboration
4810Clouds Blog
But what about…
49
rocketship.vc
Peripheral Vision
• To make optimal decisions, humans must
provide “peripheral vision” to model
• Is this data point an outlier or does it fit the
model?
• e.g., Geo or category in VC
• Is there bias in the model?
• e.g., historical racial gap in sentencing and parole decisions
• Has the world changed in a way that
invalidates the assumption of the model?
• e.g., flash crash on Wall Street
50
The Problem
•Must judges, policemen,
doctors, bureaucrats
understand the nuances of
the data and the model?
•Even trickier when we
consider complex workflows
involving multiple decision
makers
• e.g., a drug trial
51
The Opportunity
• Systems that include humans and models
as peers
• Can also be complex workflows that involve many
humans and models
• How best to structure such systems to
produce optimal decisions?
• Model might need to be tuned to work with specific
human
• Model Invalidation
• Can models know when they are no longer valid?
52
Is it time to disrupt Mechanical Turk?
• The world has changed a lot
since Mechanical Turk was
introduced in 2005
• Can we move closer to true
hybrid human-machine
computing?
• Harness both human initiative and
computing power
• Harness sensors in phones
• Reimagine problems, tasks and
incentives
53
Lessons and Opportunities
1. Startup and Investment Landscape
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
54
Data-driven software all around us…
55
The Agency Problem
•Each model is optimized
for the good of the
company that owns it
•Often our goals and the
company’s goals are in
alignment but not always!
56
Problems
• Privacy
• Everyone has your data and is modeling your actions
• Pricing and Discovery disadvantage
• You discover only what they choose to show you
• You are not a population
• Each service models its population of users
• And is optimizing for its own ends
• Would you rather be explored or exploited?
57
We have helped create this situation
vs
Wooden weapons against guns and steel
59Conquistadors and Incas -- Painting by John Everett Millais
Or if you prefer…
60South Park
Enter the Cyborg
61
Cyborg Layer mediates interactions
62
Cyborg Layer Services
• Privacy protection
• e.g., using Differential Privacy techniques
• Or by strategically spreading interactions across services
• e.g., watch some movies on Netflix and some on Amazon
• Discovery and Pricing
• Looks at a larger selection and picks items for you
• Acts strictly as your agent; no conflict
• Combine personal and population models
• Cyborg has complete access to all my data
• External services have population data, but only limited
window
63
Combining Personal and Population
Models
64
Lessons and Opportunities
1. The Age of the App
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. The Rise of the Cyborg
5. The Data is not a Given
65
How to build a Model: Conventional View
• Use ground truth to build the best model
possible
• Feature engineering + model selection
• Maybe some data cleaning and integration
66
Example: Troo.ly
2005
TRANSACTIONS
2015
EXPERIENCES
Need for online trust has
grown dramatically!
Would you rent your house
to this stranger?
WHAT WE
ARE GIVEN
Troo.ly Problem Statement
KNOWN
BAD
KNOWN
GOOD
NOT
KNOWN
Can you trust the ground truth?
!
Bad users might have a good label if they haven’t
engaged in bad activity yet
Labels may be incorrect if they are coming from bad
internal models
Labels may be incorrect because of wrong attributions
in bad transactions
!
!
Rocketship.vc: company data
70
• How to tradeoff data sources
based on Coverage, Accuracy,
Depth, Freshness, and Cost?
• Which subset of data sources
yields the best model?
• Which subset of data sources
will identify promising
companies most quickly?
• Promising start
• Dong et al, VLDB 2012
• Rekatsinas et al, SIGMOD
2014
Algorithmic Law Enforcement
71The Economist, August 20, 2016
But what about perpetuating
bias against minorities?
Summary
• Cannot trust the given data completely
• Ground truth is often neither true nor grounded
• Data may have bias
• Look for additional data that can improve
model
• Quality/cost tradeoff?
• Generate your own training data!
• E.g., Polarr photo-editing app
• Data Programming (Ratner et al, 2016)
72
CONCLUSION
73
Summary
• 5 generations of data-driven applications
• Lessons and Opportunities
1. The Age of the Intelligent App
2. Disruption vs Optimization
3. Human-Machine Collaboration
4. Rise of the Cyborg
5. The Data is not a Given
74
Identity Crisis?
75
Data Management
Semantic
Web
Machine
Learning
Data Mining
Information
Retrieval
AI
Systems
Panel at NorCal DB Day, 2016
Marketing Myopia
76Marketing Myopia, Theodore Levitt. HBS Case Study, 1960
Data impacts every human endeavor
77
Data
Entertainment
Transportation
Government
ManufacturingSciences
Education
Security
Commerce
Data + X
• Core identity of the field is to create value
from data
• Never a better time for it!
• Data is now a key part of every field of
human endeavor
• Stanford CS+X
• The value of being an outsider
78
Go Forth And Disrupt!
79
Entertainment
Transportation
Government
ManufacturingSciences
Education
Security
Commerce
ANNOUNCEMENT
80
IIT Madras CS Visiting Chair Program
• Focus area: data-driven
approaches to tackle important
problems
• Leading faculty/researchers
from around the world welcome!
• Flexible time commitment
• Minimum 2 weeks
• Endowed by Venky Harinarayan
and Anand Rajaraman
81
Confirmed Visiting Chairs so far…
82
Jeff Ullman
Professor Emeritus, CS
Stanford
Randy Katz
Distinguished Professor, EECS
UC Berkeley
Hari Balakrishnan
Professor, EECS
MIT
For more information
deaniar@iitm.ac.in
83
Prof. Nagarajan
Thanks!
Anand Rajaraman
datawocky@gmail.com
@anand_raj

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

A Futurist Looking Back
A Futurist Looking BackA Futurist Looking Back
A Futurist Looking Back
 
AI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesAI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for Libraries
 
Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016
 
20210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v120210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v1
 
How Organizations can gain Strategic Advantage when Everyone is applying AI
How Organizations can gain Strategic Advantage when Everyone is applying AIHow Organizations can gain Strategic Advantage when Everyone is applying AI
How Organizations can gain Strategic Advantage when Everyone is applying AI
 
BDVe Webinar Series - QROWD: The Human Factor in Big Data
BDVe Webinar Series - QROWD: The Human Factor in Big DataBDVe Webinar Series - QROWD: The Human Factor in Big Data
BDVe Webinar Series - QROWD: The Human Factor in Big Data
 
BDVe Webinar Series - QROWD: The Human Factor in Big Data
BDVe Webinar Series - QROWD: The Human Factor in Big DataBDVe Webinar Series - QROWD: The Human Factor in Big Data
BDVe Webinar Series - QROWD: The Human Factor in Big Data
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Designing AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in BostonDesigning AI for Humanity at dmi:Design Leadership Conference in Boston
Designing AI for Humanity at dmi:Design Leadership Conference in Boston
 
Next Generation Digital Enterprise (Workplace) Technology | Enterprise Digita...
Next Generation Digital Enterprise (Workplace) Technology | Enterprise Digita...Next Generation Digital Enterprise (Workplace) Technology | Enterprise Digita...
Next Generation Digital Enterprise (Workplace) Technology | Enterprise Digita...
 
21st Century Strategy
21st Century Strategy21st Century Strategy
21st Century Strategy
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Speaker Slides: Bringing Agile Management to International Development
Speaker Slides: Bringing Agile Management to International DevelopmentSpeaker Slides: Bringing Agile Management to International Development
Speaker Slides: Bringing Agile Management to International Development
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
AtlasCamp 2015: Builders advancing humanity: Past to future
AtlasCamp 2015: Builders advancing humanity: Past to futureAtlasCamp 2015: Builders advancing humanity: Past to future
AtlasCamp 2015: Builders advancing humanity: Past to future
 
Women On The Leading Edge
Women On The Leading Edge Women On The Leading Edge
Women On The Leading Edge
 
Let's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational designLet's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational design
 
The Future of Work
The Future of Work The Future of Work
The Future of Work
 
Ibm
IbmIbm
Ibm
 
Innovations in HR
Innovations in HRInnovations in HR
Innovations in HR
 

Similar a Disrupting with Data: Lessons from Silicon Valley

Deck from Cap Gemini Conference
Deck from Cap Gemini ConferenceDeck from Cap Gemini Conference
Deck from Cap Gemini Conference
Geoffrey Moore
 

Similar a Disrupting with Data: Lessons from Silicon Valley (20)

MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...
MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...
MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
Digitalization and Innovation - Today and Tomorrow
Digitalization and Innovation - Today and TomorrowDigitalization and Innovation - Today and Tomorrow
Digitalization and Innovation - Today and Tomorrow
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Deck from Cap Gemini Conference
Deck from Cap Gemini ConferenceDeck from Cap Gemini Conference
Deck from Cap Gemini Conference
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Draper
 
CIO 360 grados: empoderamiento total
CIO 360 grados: empoderamiento totalCIO 360 grados: empoderamiento total
CIO 360 grados: empoderamiento total
 
The Network Multiplier - A One Day Program
The Network Multiplier - A One Day ProgramThe Network Multiplier - A One Day Program
The Network Multiplier - A One Day Program
 
Digital Transformation.pdf
Digital Transformation.pdfDigital Transformation.pdf
Digital Transformation.pdf
 
Digital Transformation - A POV
 Digital Transformation - A POV Digital Transformation - A POV
Digital Transformation - A POV
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev fest
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Blockchain, IoT and AI are foundational to the Fourth Industrial Revolution -...
Blockchain, IoT and AI are foundational to the Fourth Industrial Revolution -...Blockchain, IoT and AI are foundational to the Fourth Industrial Revolution -...
Blockchain, IoT and AI are foundational to the Fourth Industrial Revolution -...
 
Samsung Business Summit
Samsung Business SummitSamsung Business Summit
Samsung Business Summit
 
Our Digital Futures
Our Digital FuturesOur Digital Futures
Our Digital Futures
 
"Developments in Accessibility of Information" - Access Israel 's 6th Annual ...
"Developments in Accessibility of Information" - Access Israel 's 6th Annual ..."Developments in Accessibility of Information" - Access Israel 's 6th Annual ...
"Developments in Accessibility of Information" - Access Israel 's 6th Annual ...
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Disrupting with Data: Lessons from Silicon Valley

  • 1. Data-Driven Disruption: Lessons from Silicon Valley Anand Rajaraman
  • 2. The Rise of Data Driven Disruption 2
  • 3. 50-fold Growth from 2010 to 2020 3 2014: More bits in the digital universe than stars in the physical universe
  • 4. Sources of Data • The world creates 1.7MB of data per minute per person 4The Digital Universe -- IDC Report, 2014
  • 6. Talk outline • The evolution of data-driven applications • 5 generations • Lessons and Opportunities • From the intersection of startups, venture capital, and research • Key theme: Disruption vs Optimization • Conclusion 6
  • 8. Follow the Data! • Value-creation has followed the most valuable data sources available! • 5 overlapping generations 8
  • 9. Data driven apps: The First Generation • All about leveraging private, structured data assets for competitive advantage • E.g., Sales, inventory, payroll, … 9
  • 10. Data-driven apps: The Second Generation • Harnessing the power of public data 10
  • 11. Data-Driven Apps: The Third Generation • Leveraging the power of “semi-public” Social + Mobile Data • Personal data shared in a frictionless manner with user’s consent 11
  • 13. Data-driven apps: The Fourth Generation • Combining public, semi-public, and private data 13 +
  • 14. 4G Example: Paysa 14 • Am I being compensated fairly? • 2012 Stanford CS grad • Java, C++, Ruby, and Machine Learning • Software Eng II at Google
  • 15. 4G Example: Paysa 15 Salaries 35M+ salary datapoints Companies 500k+ companies People Professional DNA of 15M tech employees Jobs Millions of job postings updated daily Local/National Government Databases Partnerships (e.g., Udacity) Recruiters Companies Web Crawl Social Media Private Public
  • 16. The Fifth Generation: Just add AI! 16 • Companies generate massive amounts of training data • New class of proprietary data
  • 21. Lessons and Opportunities 1. Startup and Investment Landscape 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 21
  • 22. Lessons and Opportunities 1. Startup and Investment Landscape 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 22
  • 25. Analytics • Data exploration and modeling for data scientists and business people 25
  • 27. The “Why?” Question • Why are signups down this week? • Why did this marketing campaign do so well? • Why did this A/B test not perform? 27
  • 28. Consumer Behavior Analytics: Cuberon 28 Build data cube Identify anomalous subcubes
  • 29. Intelligent Applications 29Matt Turck, Jim Hao & FirstMark Capital
  • 30. More Intelligent Applications… 30Matt Turck, Jim Hao & FirstMark Capital
  • 31. Intelligent App Example: Descartes Labs 31Another example: Zillow
  • 32. Trends and Takeaways • Infrastructure is available and solid • Major transition from Hadoop to Spark • Investment focus on “Vertical” analytics plays • e.g., Cuberon, Ayasdi • The Age of the Intelligent App has dawned • Major opportunities and investment dollars flowing here! • e.g., Troo.ly, Descartes Labs, DocsApp 32
  • 33. Lessons and Opportunities 1. Startup and Investment Landscape 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 33
  • 36. Beware the Hippo HiPPO = Highest Paid Person’s Opinion 36
  • 37. Why does disruption happen? • Data scientist as advisor not decision maker • Domain expertise and experience often win out over data • Data-driven approach enables a completely different business model • E.g., A la carte streaming vs fixed number of channels • Cannibalization concerns • Fear of making mistakes • Algorithms can make mistakes • But algorithms can learn and improve much faster with data! 37
  • 38. Why does disruption happen? • Classic Innovator’s Dilemma with a turbo- boost: data network effects • Accelerates the pace of disruption 38
  • 39. Disruption Example: Venture Capital • Venture Capital has been an established industry for several decades • Process has not changed much since early days • VC firms expect entrepreneurs to approach them with pitches • Some VC firms have tried using data • Data scientists in advisory role • Not partners who make investment decisions • High concentration in Silicon Valley • And a few other places… 39
  • 40. Sets the stage for… 40 rocketship.vc Venture Investing through Data Science
  • 41. More Global Startups 41 Reduced costs to launch a startup Large consolidating markets; smartphone ubiquity Emerging Market Opportunities Untapped talent pools
  • 42. Beyond Human Scale 42 2.1 Million “Startups” 115K need funding at any time 90% outside Silicon Valley 12.8 Million Companies
  • 43. Why Data-Driven? Geography 43 0 10 20 30 40 50 60 70 80 90 100 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Count Number of $B companies by year Silicon Valley Outside Silicon Valley
  • 45. Business Model Innovation • Proactively identify interesting companies and reach out to them at the appropriate moment 45 South America 9% East Europe 11% China 13% India 7%Other East Asia 11% Other Europe 5% Other North America 7% US SF 11% US Other 22% Unknown 4%
  • 46. Optimize or Disrupt? • Key question for every entrepreneur (and researcher too!) • Often difference between success and failure • Hard to answer in general, but look out for disruption cues • Established, fragmented industry • Slow to adopt latest technology trend • Asset-heavy models • Risk/reward tradeoff • Disruption is much riskier but the rewards compensate 46
  • 47. Lessons and Opportunities 1. Startup and Investment Landscape 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 47
  • 48. Current view of Human-Machine Collaboration 4810Clouds Blog
  • 50. Peripheral Vision • To make optimal decisions, humans must provide “peripheral vision” to model • Is this data point an outlier or does it fit the model? • e.g., Geo or category in VC • Is there bias in the model? • e.g., historical racial gap in sentencing and parole decisions • Has the world changed in a way that invalidates the assumption of the model? • e.g., flash crash on Wall Street 50
  • 51. The Problem •Must judges, policemen, doctors, bureaucrats understand the nuances of the data and the model? •Even trickier when we consider complex workflows involving multiple decision makers • e.g., a drug trial 51
  • 52. The Opportunity • Systems that include humans and models as peers • Can also be complex workflows that involve many humans and models • How best to structure such systems to produce optimal decisions? • Model might need to be tuned to work with specific human • Model Invalidation • Can models know when they are no longer valid? 52
  • 53. Is it time to disrupt Mechanical Turk? • The world has changed a lot since Mechanical Turk was introduced in 2005 • Can we move closer to true hybrid human-machine computing? • Harness both human initiative and computing power • Harness sensors in phones • Reimagine problems, tasks and incentives 53
  • 54. Lessons and Opportunities 1. Startup and Investment Landscape 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 54
  • 55. Data-driven software all around us… 55
  • 56. The Agency Problem •Each model is optimized for the good of the company that owns it •Often our goals and the company’s goals are in alignment but not always! 56
  • 57. Problems • Privacy • Everyone has your data and is modeling your actions • Pricing and Discovery disadvantage • You discover only what they choose to show you • You are not a population • Each service models its population of users • And is optimizing for its own ends • Would you rather be explored or exploited? 57
  • 58. We have helped create this situation vs
  • 59. Wooden weapons against guns and steel 59Conquistadors and Incas -- Painting by John Everett Millais
  • 60. Or if you prefer… 60South Park
  • 62. Cyborg Layer mediates interactions 62
  • 63. Cyborg Layer Services • Privacy protection • e.g., using Differential Privacy techniques • Or by strategically spreading interactions across services • e.g., watch some movies on Netflix and some on Amazon • Discovery and Pricing • Looks at a larger selection and picks items for you • Acts strictly as your agent; no conflict • Combine personal and population models • Cyborg has complete access to all my data • External services have population data, but only limited window 63
  • 64. Combining Personal and Population Models 64
  • 65. Lessons and Opportunities 1. The Age of the App 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. The Rise of the Cyborg 5. The Data is not a Given 65
  • 66. How to build a Model: Conventional View • Use ground truth to build the best model possible • Feature engineering + model selection • Maybe some data cleaning and integration 66
  • 67. Example: Troo.ly 2005 TRANSACTIONS 2015 EXPERIENCES Need for online trust has grown dramatically! Would you rent your house to this stranger?
  • 68. WHAT WE ARE GIVEN Troo.ly Problem Statement KNOWN BAD KNOWN GOOD NOT KNOWN
  • 69. Can you trust the ground truth? ! Bad users might have a good label if they haven’t engaged in bad activity yet Labels may be incorrect if they are coming from bad internal models Labels may be incorrect because of wrong attributions in bad transactions ! !
  • 70. Rocketship.vc: company data 70 • How to tradeoff data sources based on Coverage, Accuracy, Depth, Freshness, and Cost? • Which subset of data sources yields the best model? • Which subset of data sources will identify promising companies most quickly? • Promising start • Dong et al, VLDB 2012 • Rekatsinas et al, SIGMOD 2014
  • 71. Algorithmic Law Enforcement 71The Economist, August 20, 2016 But what about perpetuating bias against minorities?
  • 72. Summary • Cannot trust the given data completely • Ground truth is often neither true nor grounded • Data may have bias • Look for additional data that can improve model • Quality/cost tradeoff? • Generate your own training data! • E.g., Polarr photo-editing app • Data Programming (Ratner et al, 2016) 72
  • 74. Summary • 5 generations of data-driven applications • Lessons and Opportunities 1. The Age of the Intelligent App 2. Disruption vs Optimization 3. Human-Machine Collaboration 4. Rise of the Cyborg 5. The Data is not a Given 74
  • 75. Identity Crisis? 75 Data Management Semantic Web Machine Learning Data Mining Information Retrieval AI Systems Panel at NorCal DB Day, 2016
  • 76. Marketing Myopia 76Marketing Myopia, Theodore Levitt. HBS Case Study, 1960
  • 77. Data impacts every human endeavor 77 Data Entertainment Transportation Government ManufacturingSciences Education Security Commerce
  • 78. Data + X • Core identity of the field is to create value from data • Never a better time for it! • Data is now a key part of every field of human endeavor • Stanford CS+X • The value of being an outsider 78
  • 79. Go Forth And Disrupt! 79 Entertainment Transportation Government ManufacturingSciences Education Security Commerce
  • 81. IIT Madras CS Visiting Chair Program • Focus area: data-driven approaches to tackle important problems • Leading faculty/researchers from around the world welcome! • Flexible time commitment • Minimum 2 weeks • Endowed by Venky Harinarayan and Anand Rajaraman 81
  • 82. Confirmed Visiting Chairs so far… 82 Jeff Ullman Professor Emeritus, CS Stanford Randy Katz Distinguished Professor, EECS UC Berkeley Hari Balakrishnan Professor, EECS MIT