SlideShare una empresa de Scribd logo
1 de 31
Data Science in the Real
World: Making a
Difference
Srinath Perera
Director Research WSO2, Apache Member
(@srinath_perera)
srinath@wso2.com
StatDay 2015 @ University of Colombo
Outline
 Making sense of World’s Data
 Building Data Systems
 Changing Dynamics of Data Analysis
with Big Data ( Sensor Data)
 Challenges and Open Problems
Michael Stonebraker
“But then, out of nowhere, some
marketing guys started talking
about ‘big data, That’s when I
realized that I’d been studying
this thing for the better part of
my academic life.”
Michael Stonebraker
“But then, out of nowhere, some
marketing guys started talking
about ‘big data, That’s when I
realized that I’d been studying
this thing for the better part of
my academic life.”
ACM Turing Award,
A Day inYour Life
Think about a day in your life?
- What is the best road to take?
- Would there be any bad weather?
- How to invest my money?
- How is my health?
There are many decisions that you can do
better if only you can access the data and
process them.
http://www.flickr.com/photos/kcolwell/551246
1652/ CC licence
What can We do with Data?
Optimize (World is inefficient)
- 30% food wasted farm to plate
- GE Save 1% initiative (http://goo.gl/eYC0QE )
- Trains => 2B/ year
- US healthcare => 20B/ year
Save lives
- Weather, Disease identification, Personalized treatment
Technology advancement
- Most high tech research are done via simulations
Building Data
Processing Systems
Data Science Architecture
Data ProcessingTechnologies Landscape
Batch Processing
Store and process
Slow (> 5 minutes for results for
a reasonable usecase)
Programming model is
MapReduce
- Apache Hadoop
- Spark
Lot of tools built on top
- Hive Shark for (SQL style queries), Mahout (ML), Giraph (Graph Processing)
Usecase: Big Data for development
Done using CDR data
People density noon vs. midnight
(red => increased, blue =>
decreased)
Urban Planning
- People distribution
- Mobility
- Waste Management
- E.g. see http://goo.gl/jPujmM
From: http://lirneasia.net/2014/08/what-does-big-data-say-about-sri-lanka/
Value of some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient
monitoring) the value of insights degrades very quickly with time.
- E.g. stock markets and speed of light
We need technology that can produce
outputs fast
- Static Queries, but need very fast output
(Alerts, Realtime control)
- Dynamic and Interactive Queries ( Data
exploration)
Complex Event Processing
Predictive Analytics
 If we know how to solve a problem, that is if we know
a finite set of rules, then we can programs it.
 For some problems (e.g. Drive a car, character
recognition), we do not know a finite fix rule set.
 Instead of programming, we give lot of examples and
ask the computer to learn (often called Machine
Learning)
 Lot of tools
- R ( Statistical language)
- Sci-kit learn (Phython)
- Apache Spark’s MLBase and Apache Mahout (Java)
Usecase: Predictive Maintenance
Idea is to fix the problem before it
broke, avoiding expensive downtimes
- Airplanes, turbines, windmills
- Construction Equipment
- Car, Golf carts
How
- Build a model for normal operation and
compare deviation
- Match against known error patterns
Communicate:
Dashboards
 Idea is to given the “Overall idea” in a glance
(e.g. car dashboard)
 Support for personalization, you can build
your own dashboard.
 Also the entry point for Drill down
 How to build?
- Expose data via JSON
- Build Dashboard via Google Gadget and
content via HTML5 + java scripts (Use
charting libraries like Vega or D3)
Communicate:Alerts andTriggers
Detecting conditions can be done
via Event Processing system ( e.g.
CEP)
Key is the “Last Mile”
- Email
- SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
Case Study: Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
Changing Dynamics
Large Observational Datasets
Stats are easy with designed experiments
- You got to select a representative set
- You have a control group
You have lot and lot of data and lot and
lot of computing power ( compared to
what you had)
Two reactions!!
“It is better to be roughly
right than precisely
wrong.”
John Keynes―
In the long run, we
are all Dead!!
Challenges: Causality
 Correlation does not imply Causality!! ( send a book home
example [1])
 Causality
- do repeat experiment with identical test
- If CAN’T do a randomized test (A/B test)
- With Big data we cannot do either
 Option 1: We can act on correlation if we can verify the
guess or if correctness is not critical (Start Investigation,
Check for a disease, Marketing )
 Option 2: We verify correlations using A/B testing or
propensity analysis
[1] http://www.freakonomics.com/2008/12/10/the-blagojevich-upside/
[2] https://hbr.org/2014/03/when-to-act-on-a-correlation-and-when-not-to/
Curious Case of Missing Data
http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-facebook-today, Pic from
http://www.phibetaiota.net/2011/09/defdog-the-importance-of-selection-bias-in-statistics/
•WW II, Returned Aircrafts and
data on where they were hit?
•How would you add Armour?
More Data Beat a Clever Algorithm
Observed by large internet
companies
Also seen over keggle
Competitions
E.g. SVM vs. Logistic regression
Read “A Few Useful Things to Know
about Machine Learning” (Pedro
Domingos)
Challenges: Feature Engineering
In ML feature engineering is the key [1].
You need features to form a kernel. Then you can solve with
less data.
Deep learning can learn best feature (combination) via semi
or unsupervised learning [2]
1. Bekkerman’s talk https://www.youtube.com/watch?v=wjTJVhmu1JM
2. Deep Learning, http://cl.naist.jp/~kevinduh/a/deep2014/
Challenges:Taking Decisions (Context)
Challenges: Updating Models
● Incorporate more data
o We get more data over time
o We get feed back about effectiveness
of decisions (e.g. Accuracy of Fraud)
o Trends change
● Track and update model
o Generate models in batch mode and
update
o Streaming (Online) ML, which is an
active research topic
Challenges: Lack of Labeled Data
•Most data is not labeled
•Idea of Semi Supervised learning
•Provide Data + Examples +
Ontology, and algorithm find new
patterns
–Lot of Data
–Few example sentences
•Often uses Expectations
Maximization (EM) Algorithm
Watch Tom Mitchell’s Lecture https://www.youtube.com/watch?v=psFnHkIjHA0
Ontology: People, Cities
Relationships: like,
dislike, live in
Examples: Bob (People)
lives in Colombo (City)
TwoTakeaways
Do your data Processing as part of a Bigger system
- Think Systems, automate, make a difference
- Realtime vs Batch
- Use tools ( Do not reinvent the wheel)
Think how dynamics are changing (Uncontrolled experiments,
lot of Data)
- Do not be a data Pessimist
- However, do not do stupid things either
Questions?

Más contenido relacionado

La actualidad más candente

Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Data Science
Data ScienceData Science
Data ScienceRabin BK
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy dataVivek Gandhi
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...SlideTeam
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 

La actualidad más candente (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data Science
Data ScienceData Science
Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science
Data ScienceData Science
Data Science
 
Data science
Data scienceData science
Data science
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 

Destacado

Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Kristin Wolff
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Paolo Nesi
 
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...myRide: A Real-Time Information System for the Carnegie Mellon University Shu...
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...Karen Mesko
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892WSO2
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
 
Top Data Science Trends for 2015
Top Data Science Trends for 2015Top Data Science Trends for 2015
Top Data Science Trends for 2015VMware Tanzu
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Han Woo PARK
 
Real time data services
Real time data servicesReal time data services
Real time data servicesRelevate
 
Data Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceData Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceMichael Roytman
 
Banking & Smart City Ecosystem
Banking & Smart City EcosystemBanking & Smart City Ecosystem
Banking & Smart City EcosystemArki Rifazka
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...Cubic Corporation
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...VMware Tanzu
 
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScienceNAVER D2
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterpriseVMware Tanzu
 

Destacado (20)

Data Science Applications
Data Science ApplicationsData Science Applications
Data Science Applications
 
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Big Data + Social Graph
Big Data + Social GraphBig Data + Social Graph
Big Data + Social Graph
 
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
Smart City Ecosystem, fram data to value for the citizens, Km4City solution, ...
 
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...myRide: A Real-Time Information System for the Carnegie Mellon University Shu...
myRide: A Real-Time Information System for the Carnegie Mellon University Shu...
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892Wso2datasciencesummerschool20151 150714180825-lva1-app6892
Wso2datasciencesummerschool20151 150714180825-lva1-app6892
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
Top Data Science Trends for 2015
Top Data Science Trends for 2015Top Data Science Trends for 2015
Top Data Science Trends for 2015
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
Real time data services
Real time data servicesReal time data services
Real time data services
 
Data Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data ScienceData Science ATL Meetup - Risk I/O Security Data Science
Data Science ATL Meetup - Risk I/O Security Data Science
 
Banking & Smart City Ecosystem
Banking & Smart City EcosystemBanking & Smart City Ecosystem
Banking & Smart City Ecosystem
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
 
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 

Similar a Data Science in the Real World: Making a Difference

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Boris Adryan
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
Machine Learning in the Real World
Machine Learning in the Real WorldMachine Learning in the Real World
Machine Learning in the Real WorldSrinath Perera
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorialeswcsummerschool
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...Daniel Katz
 
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesAlvaro Graves
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 

Similar a Data Science in the Real World: Making a Difference (20)

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16
 
On Big Data
On Big DataOn Big Data
On Big Data
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
Machine Learning in the Real World
Machine Learning in the Real WorldMachine Learning in the Real World
Machine Learning in the Real World
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologies
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 

Más de Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

Más de Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Último

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 

Último (20)

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 

Data Science in the Real World: Making a Difference

  • 1. Data Science in the Real World: Making a Difference Srinath Perera Director Research WSO2, Apache Member (@srinath_perera) srinath@wso2.com StatDay 2015 @ University of Colombo
  • 2. Outline  Making sense of World’s Data  Building Data Systems  Changing Dynamics of Data Analysis with Big Data ( Sensor Data)  Challenges and Open Problems
  • 3. Michael Stonebraker “But then, out of nowhere, some marketing guys started talking about ‘big data, That’s when I realized that I’d been studying this thing for the better part of my academic life.”
  • 4. Michael Stonebraker “But then, out of nowhere, some marketing guys started talking about ‘big data, That’s when I realized that I’d been studying this thing for the better part of my academic life.” ACM Turing Award,
  • 5. A Day inYour Life Think about a day in your life? - What is the best road to take? - Would there be any bad weather? - How to invest my money? - How is my health? There are many decisions that you can do better if only you can access the data and process them. http://www.flickr.com/photos/kcolwell/551246 1652/ CC licence
  • 6.
  • 7. What can We do with Data? Optimize (World is inefficient) - 30% food wasted farm to plate - GE Save 1% initiative (http://goo.gl/eYC0QE ) - Trains => 2B/ year - US healthcare => 20B/ year Save lives - Weather, Disease identification, Personalized treatment Technology advancement - Most high tech research are done via simulations
  • 11. Batch Processing Store and process Slow (> 5 minutes for results for a reasonable usecase) Programming model is MapReduce - Apache Hadoop - Spark Lot of tools built on top - Hive Shark for (SQL style queries), Mahout (ML), Giraph (Graph Processing)
  • 12. Usecase: Big Data for development Done using CDR data People density noon vs. midnight (red => increased, blue => decreased) Urban Planning - People distribution - Mobility - Waste Management - E.g. see http://goo.gl/jPujmM From: http://lirneasia.net/2014/08/what-does-big-data-say-about-sri-lanka/
  • 13. Value of some Insights degrade Fast! For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrades very quickly with time. - E.g. stock markets and speed of light We need technology that can produce outputs fast - Static Queries, but need very fast output (Alerts, Realtime control) - Dynamic and Interactive Queries ( Data exploration)
  • 15. Predictive Analytics  If we know how to solve a problem, that is if we know a finite set of rules, then we can programs it.  For some problems (e.g. Drive a car, character recognition), we do not know a finite fix rule set.  Instead of programming, we give lot of examples and ask the computer to learn (often called Machine Learning)  Lot of tools - R ( Statistical language) - Sci-kit learn (Phython) - Apache Spark’s MLBase and Apache Mahout (Java)
  • 16. Usecase: Predictive Maintenance Idea is to fix the problem before it broke, avoiding expensive downtimes - Airplanes, turbines, windmills - Construction Equipment - Car, Golf carts How - Build a model for normal operation and compare deviation - Match against known error patterns
  • 17. Communicate: Dashboards  Idea is to given the “Overall idea” in a glance (e.g. car dashboard)  Support for personalization, you can build your own dashboard.  Also the entry point for Drill down  How to build? - Expose data via JSON - Build Dashboard via Google Gadget and content via HTML5 + java scripts (Use charting libraries like Vega or D3)
  • 18. Communicate:Alerts andTriggers Detecting conditions can be done via Event Processing system ( e.g. CEP) Key is the “Last Mile” - Email - SMS - Push notifications to a UI - Pager - Trigger physical Alarm
  • 19. Case Study: Realtime Soccer Analysis Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
  • 21. Large Observational Datasets Stats are easy with designed experiments - You got to select a representative set - You have a control group You have lot and lot of data and lot and lot of computing power ( compared to what you had) Two reactions!!
  • 22. “It is better to be roughly right than precisely wrong.” John Keynes― In the long run, we are all Dead!!
  • 23. Challenges: Causality  Correlation does not imply Causality!! ( send a book home example [1])  Causality - do repeat experiment with identical test - If CAN’T do a randomized test (A/B test) - With Big data we cannot do either  Option 1: We can act on correlation if we can verify the guess or if correctness is not critical (Start Investigation, Check for a disease, Marketing )  Option 2: We verify correlations using A/B testing or propensity analysis [1] http://www.freakonomics.com/2008/12/10/the-blagojevich-upside/ [2] https://hbr.org/2014/03/when-to-act-on-a-correlation-and-when-not-to/
  • 24. Curious Case of Missing Data http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-facebook-today, Pic from http://www.phibetaiota.net/2011/09/defdog-the-importance-of-selection-bias-in-statistics/ •WW II, Returned Aircrafts and data on where they were hit? •How would you add Armour?
  • 25. More Data Beat a Clever Algorithm Observed by large internet companies Also seen over keggle Competitions E.g. SVM vs. Logistic regression Read “A Few Useful Things to Know about Machine Learning” (Pedro Domingos)
  • 26. Challenges: Feature Engineering In ML feature engineering is the key [1]. You need features to form a kernel. Then you can solve with less data. Deep learning can learn best feature (combination) via semi or unsupervised learning [2] 1. Bekkerman’s talk https://www.youtube.com/watch?v=wjTJVhmu1JM 2. Deep Learning, http://cl.naist.jp/~kevinduh/a/deep2014/
  • 28. Challenges: Updating Models ● Incorporate more data o We get more data over time o We get feed back about effectiveness of decisions (e.g. Accuracy of Fraud) o Trends change ● Track and update model o Generate models in batch mode and update o Streaming (Online) ML, which is an active research topic
  • 29. Challenges: Lack of Labeled Data •Most data is not labeled •Idea of Semi Supervised learning •Provide Data + Examples + Ontology, and algorithm find new patterns –Lot of Data –Few example sentences •Often uses Expectations Maximization (EM) Algorithm Watch Tom Mitchell’s Lecture https://www.youtube.com/watch?v=psFnHkIjHA0 Ontology: People, Cities Relationships: like, dislike, live in Examples: Bob (People) lives in Colombo (City)
  • 30. TwoTakeaways Do your data Processing as part of a Bigger system - Think Systems, automate, make a difference - Realtime vs Batch - Use tools ( Do not reinvent the wheel) Think how dynamics are changing (Uncontrolled experiments, lot of Data) - Do not be a data Pessimist - However, do not do stupid things either