SlideShare a Scribd company logo
1 of 48
Download to read offline
The Unrealized Power of Data
Roger Barga
GPM, CloudML C+E
Big Data is Here Today
Disruptive Innovation…
http://bit.ly/5a9KL
Competing on Analytics
If you can predict it, you can own it…
• Forecasting
• Targeting
• Fraud detection, anti-fraud analytics
• Risk
• Customer churn, conversion
• Propensity
• Price Elasticity
By 2014, 30 percent of analytic applications will use predictive capabilities.
–GartnerBusinessIntelligenceSummit2012
Predictive Analytics in Action
$3 Million in Health Analytics Prize Money Available http://bit.ly/groYXc
Predictive Analytics Applied to the Hospital Billing Process http://bit.ly/gpX9ak
Educational Data Mining is becoming popular http://bit.ly/fM0M1a
New Model Predict Presence of Meth Labs Very Accurately http://bit.ly/aIgUGH
Monitoring Physician Performance for Predictive Profiling http://bit.ly/bYt4sg
Predicting Employee Success http://bit.ly/clox2R
Forecast Career Paths by Resume Analytics http://bit.ly/gvlFKJ
Thorotrends offers research/analytics for horse racing http://bit.ly/diinec
How Target Figured Out A Teen Girl Was Pregnant http://onforb.es/SokL3j
Data Science
65% of enterprises feel they have a strategic shortage of data scientists, a
role many did not even know existed 12 months ago…
Data Science is far too complex today
• Software Development
• Data Exploration
• Math & Statistics Knowledge
• Machine Learning
• Data Management
this must get simpler, it simply won’t scale!
Talk Outline
Predictive Analytics Today
Predictive analytics can be a potent weapon in your business analytics toolbox
With increasing commoditization it will truly be the next differentiator
Advancement of Data Science
Unique role and data science process;
Applying the principles of scientific experimentation to business;
Application of Predictive Analytics Within Microsoft
Demo our internal Azure service for machine learning over data to build predictive models;
Discuss three models built using Data Lab that we are using internally;
Predictive Analytics
“Trying to predict the future is like trying to drive down a country road
at night with no lights while looking out the back window.”
– Peter Drucker
• We must seek out patterns among seemingly
unconnected data and disciplines to gain
valuable insights.
• It took 50 years for organizations
to fully leverage transactions systems, so it
will time to realize significant value from
knowledge, information and data.
• What gets measured, gets done.
Predictive Analytics
Successful Predictions
Predictive Analytics
Successful Predictions
Predictive Analytics
Customer discussions, from the front lines
By applying predictive analytics to data they already have, organizations are
uncovering unexpected patterns and associations, and they are beginning to develop
predictive models to guide front-line interactions.
Predictive Analytics
Customer discussions, from the front lines
By applying predictive analytics to data they already have, organizations are
uncovering unexpected patterns and associations, and they are beginning to develop
predictive models to guide front-line interactions.
Business Scenario:
• Downtime due to equipment failure or unnecessary maintenance bears huge cost;
• Goal: Predict failures from collected data and defer maintenance;
Five years of data, time series data from
hundreds of sensors on machines throughout
the gas field, for dozens of gas fields across
different regions of the world…
Predictive Analytics
Customer discussions, from the front lines
By applying predictive analytics to data they already have, organizations are
uncovering unexpected patterns and associations, and they are beginning to develop
predictive models to guide front-line interactions.
Business Scenario:
• Downtime due to equipment failure or unnecessary maintenance bears huge cost;
• Goal: predict failures from collected data and defer maintenance;
Time series analysis to create
features and evaluate various
machine learning approaches…
Predictive Analytics
Customer discussions, from the front lines
By applying predictive analytics to data they already have, organizations are
uncovering unexpected patterns and associations, and they are beginning to develop
predictive models to guide front-line interactions.
Business Scenario:
• Downtime due to equipment failure or unnecessary maintenance bears huge cost;
• Goal: predict failures from collected data and defer maintenance;
One-Class SVM
Predictive Analytics
Customer discussions, from the front lines
By applying predictive analytics to data they already have, organizations are
uncovering unexpected patterns and associations, and they are beginning to develop
predictive models to guide front-line interactions.
• Oil & Gas customer, building models for predictive maintenance of equipment;
• Telecom customer, building customer churn predictive model to improve retention;
Enterprises are beginning to realize that data is a strategic asset and that they need
to start treating it like one...
• If you don’t have or can’t find the data you need, consider creating it;
Predictive Analytics
Bridging the Gap
Predictive analytics must be simpler, today it requires highly specialized skills;
• Tools that their current staff and/or new hires can use for predictive modelling;
• Ability to integrate predictive models in line of business applications;
Predictive Analytics
Bridging the Gap
Predictive analytics must be simpler, today it requires highly specialized skills;
• Tools that their current staff and/or new hires can use for predictive modelling;
• Ability to integrate predictive models in line of business applications;
Enterprises require an end-to-end solution for their predictive modelling
• Support for collaboration, lineage tracking;
• Archive for predictive models, model governance;
• Support for search & discovery, reuse;
Predictive Analytics
Bridging the Gap
Predictive analytics must be simpler, today it requires highly specialized skills;
• Tools that their current staff and/or new hires can use for predictive modelling;
• Ability to integrate predictive models in line of business applications;
Enterprises require an end-to-end solution for their predictive modelling
• Support for collaboration, lineage tracking;
• Archive for predictive models, model governance;
• Support for search & discovery, reuse;
Predictions need to be operationalized, not locked in the backroom
• Deployable predictions, minimize time to insight;
• Ability to frequently update predictive model given new data, conditions change;
Predictive Analytics
Bridging the Gap
Familiar algorithms;
• Classification and Regression Trees (CART);
• Linear Regression;
• Decision Forests, Boosted Decision Trees, see Kaggle models of choice;
• Logistic Regression or other discrete choice;
• Association Rules;
• K-nearest Neighbors;
• Box Jenkins (ARIMA);
• Support Vector Machines;
• Deep Neural Networks, watch this one carefully…
More data & better features, in the right hands even a simple algorithm can
produce a very effective predictive model – enter the data scientist…
Data Scientist
A Relatively New Job Title
DJ Patil and Jeff Hammerbacher
Alternative Job Titles
Signature Skills
Math and Statistics Knowledge
Machine Learning
Programming Skills
Computer Science
Data Visualization
Data Curiosity
Emergence of Data Science Teams
Data Science
Discipline with Deep Academic Roots
Bill Cleveland, Bell Labs
Data Science Action Plan, 2001
The Data Science Journal, 2002 (CODATA)
Journal of Data Science, 2002
National Science Foundation (NSF), 2005
Long-lived Digital Data Collections
JISC, 2008
The Skills, Role & Career Structure of Data Scientists & Curators
“We have found that creative data
scientists can solve problems in every
field better than experts in those fields
can…”
“The expertise you need is strategy in
answering these questions.”
Interview with Jeremy Howard, Chief Scientist, Kaggle:
http://slate/me/UM0wQ7
Data Scientist
A Critical Role
Data Science Process
Mind the Gap, the space between the data set and the algorithm
Conventional descriptive analytics goes straight from a data set to applying an
algorithm. But in Data Science there’s a huge space in between of very important
stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard
part. The hard part is data wrangling and doing it well.
Handling missing values
Data Science Process
Mind the Gap, the space between the data set and the algorithm
Conventional descriptive analytics goes straight from a data set to applying an
algorithm. But in Data Science there’s a huge space in between of very important
stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard
part. The hard part is data wrangling and doing it well.
Handling missing values
Feature selection
Data Science Process
Mind the Gap, the space between the data set and the algorithm
Conventional descriptive analytics goes straight from a data set to applying an
algorithm. But in Data Science there’s a huge space in between of very important
stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard
part. The hard part is data wrangling and doing it well.
Handling missing values
Feature selection
Feature creation and imputation
Data Science Process
Mind the Gap, the space between the data set and the algorithm
Conventional descriptive analytics goes straight from a data set to applying an
algorithm. But in Data Science there’s a huge space in between of very important
stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard
part. The hard part is data wrangling and doing it well.
Handling missing values
Feature selection
Feature creation and imputation
Joining with external data to enrich training set
Data Science Process
Requires Agile Experimentation
Define Objective
Access and
Understand the
Data
Pre-processing
Feature and/or
Target
construction
1. Define the objective and quantify it with a metric – optionally with constraints, if
any. This typically requires domain knowledge.
2. Collect and understand the data, deal with vagaries and biases in data acquisition
(missing data, outliers due to errors in data collection process, more sophisticated
biases due to the data collection procedure, etc.
3. Frame the problem in terms of a ML problem – classification, regression, ranking,
clustering, forecasting, outlier detection etc. – combination of domain knowledge
and ML knowledge is useful.
4. Transform raw data into a “modeling dataset”, with features, targets etc., which
can be used for modeling. Feature construction often improved with domain
knowledge. Target must be identical or proxy of metric in step 1.
Data Science Process
Requires Agile Experimentation
Feature
selection
Model
training
Model
scoring
Evaluation
Train/ Test
split 5. Train, test and evaluate, taking care to control
bias/variance and ensure the metrics are reported with
the right confidence intervals (cross-validation helps
here), be vigilant against target leaks (which typically
leads to unbelievably good test metrics) – this is the
machine learning heavy step.
Data Science Process
Requires Agile Experimentation
Define Objective
Access and
Understand the
data
Pre-processing
Feature and/or
Target
construction
Feature
selection
Model
training
Model scoring
Evaluation
Train/ Test
split
6. Iterate steps (2) – (5) until the test metrics are satisfactory
Data Science Process
Support Algorithmic Creativity and Exploration
• Neural Networks
• Logistic Regression
• Support Vector Machine
• Decision Trees
• Ensemble Methods
• adaBoost
• Bayesian Networks
• Genetic Algorithms
• Random Forest
• Monte Carlo methods
• Principal Component Analysis
• Kalman Filter
• Evolutionary Fuzzy Modelling
• Deep Neural Networks
Data Science Process
Ability to Determine if a Model Will Predict With Accuracy...
Overfitting has many faces, always measure performance on out-of-sample data.
• At least one test, train on some of your data and test on the rest;
• Do several, splitting the data differently each time;
“Remember that all models are wrong; the practical question is how wrong do they
have to be to not be useful.” George E.P. Box
CloudML Studio
Model Authoring Through Visual Composition
• Accessible through
a web browser
• End2End support
for data science
workflow
• Flexible and
extensible
CloudML Studio
Foundational ML & Analytics Algorithms
Associative
rule mining
Neural
networks
Regression
analysis
Clustering Random
Forrest
Support for
extensibility
Enable users
to add their
own
algorithms as
modules
Mathematical
programming
Online
analytical
processing
Bayesian
Machine
Learning
Text
analytics
Monte Carlo
methods
Support for
vector machines
Boosted
Decision tree
learning
Time series
processing
Rapid Experimentation to Create a Better Model
• Train and test several
models from existing
library
• Try a wide range of
strategies
• Extend existing models
Immutable Model Repository
Durable repository for
predictive models and
algorithms to build up the
knowledge base and make
CloudML Studio more
valuable
Data Lab
Analyze SQL Azure database traces, predict rack capacity for planning purposes
Data Lab
Analyze SQL Azure database characterization model: bully, resource profiling, etc…
Daimler AG – Predicting gap and flush defects during the automobile manufacturing process…
Top 3 Obstacles to Using Predictive Analytics.
1. Too hard to use, can’t find skilled talent.
2. Cost of software, access to good algorithms.
3. Difficulty integrating into business processes
☁
Intelligent Web
Service
consume publish
+
Machine Learning Services on Azure
enterprise
customer
data
scientist
☁
CloudML Studio
☁• Automatically scale in response to actual usage, eliminate upfront costs for hardware resources.
• Scored in batch-mode or request-response mode;
• Actively monitor models in production to detect changes;
• Telemetry and model management (rollback, retrain).
CloudML Studio
Publish as Azure Web Service(s)
☁CloudML Service
☁CloudML Service
Closing Comments
Data is a core strategic asset, it encodes your business’ collective experience
• It is imperative to learn from your data, learn about your customers, products,…
• As your org learns from its collective experience it will discover strategic insights
…and can put this knowledge into action.
Predictive Analytics capitalizes on all available data
• Adds smarter decisions to systems, drives better outcomes, delivers greater value…
• It requires specialized expertise, talent, and tools to execute well.
Rise of the Data Scientist
• Applying the principles of scientific experimentation to business processes;
• Adept at interpretation & use of numeric data, appreciation for quantitative methods
Better tooling for data science productivity and end-to-end management…
Emerging literacies and requirements….
Want to understand the what, why, and how of predictive analytics.
Here’s a short, ordered reading list designed to get you up to speed super fast…
The Signal And The Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver.
Nate Silver built an innovative system for predicting baseball performance and predicted the 2008 and 2012 elections within a hair’s breadth.
The New York Times publishes FiveThirtyEight.com, where Silver is one of the nation’s most influential political forecasters. Why read: This
book will inspire your “predictive” imagination and ground you in the realities of predictive analytics.
Predictive Analytics: The Power To Predict Who Will Click, Buy, Lie, or Die by Eric Siegel.
Eric Siegel has written a very accessible book on how predictive analytics works. It’s chock-full of dozens of real-world examples, such as how
Chase Bank predicted mortgage risk (before the recession), IBM Watson won Jeopardy!, and Hewlett-Packard predicted employee flight
risk. Why read: Predictive Analytics is a perfectly paced explanation of how predictive analytics works and a repository of dozens of real
examples across many use cases.
Uncontrolled: The Surprising Payoff Of of Trial-and-Error For Business, Politics, and Society by Jim Manzi.
Predictive analytics is not magic — it’s science. Jim Manzi reminds us of the power of the scientific method and reveals the shocking truth that
many huge societal and business decisions are made based on misinterpreted data and statistics. Why read: Controlled experimentation
amplifies the results of predictive analytics and helps avert the risk of inaccurate predictive models.
Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark Hall.
If you write code or have a computer science background, you’ll probably want to know the gory details of how predictive analytics works.
Why read: Data Mining provides a thorough grounding in machine learning concepts as well as practical advice on applying machine
learning tools and techniques in real-world data mining situations.
Thank You

More Related Content

What's hot

H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsMichał Łopuszyński
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMMichał Łopuszyński
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science ProcessVishal Patel
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologySergey Shelpuk
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —swethaT16
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Edureka!
 

What's hot (20)

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 

Similar to Data Driven Engineering 2014

Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesSlideTeam
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analyticsRohanKumarJumnani
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Turning Big Data Analytics To Knowledge PowerPoint Presentation Slides
Turning Big Data Analytics To Knowledge PowerPoint Presentation SlidesTurning Big Data Analytics To Knowledge PowerPoint Presentation Slides
Turning Big Data Analytics To Knowledge PowerPoint Presentation SlidesSlideTeam
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...Big Data Week
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsAbhishek Sood
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data AnalyticsYASH GAIKWAD
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxaudeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxroushhsiu
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Precisely
 
Business Intelligence, Data Analytics, and AI
Business Intelligence, Data Analytics, and AIBusiness Intelligence, Data Analytics, and AI
Business Intelligence, Data Analytics, and AIJohnny Jepp
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer AnalyticsCourse5i
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?Fabricio Quintanilla
 
Data Science by Chappuis Halder & Co.
Data Science by Chappuis Halder & Co.Data Science by Chappuis Halder & Co.
Data Science by Chappuis Halder & Co.Genest Benoit
 
Making Advanced Analytics Work for You
Making Advanced Analytics Work for YouMaking Advanced Analytics Work for You
Making Advanced Analytics Work for YouSoumyadeep Sengupta
 

Similar to Data Driven Engineering 2014 (20)

Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Turning Big Data Analytics To Knowledge PowerPoint Presentation Slides
Turning Big Data Analytics To Knowledge PowerPoint Presentation SlidesTurning Big Data Analytics To Knowledge PowerPoint Presentation Slides
Turning Big Data Analytics To Knowledge PowerPoint Presentation Slides
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data Analytics
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Predictive Analytics Overview
Predictive Analytics OverviewPredictive Analytics Overview
Predictive Analytics Overview
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
 
Business Intelligence, Data Analytics, and AI
Business Intelligence, Data Analytics, and AIBusiness Intelligence, Data Analytics, and AI
Business Intelligence, Data Analytics, and AI
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer Analytics
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?
 
Data Science by Chappuis Halder & Co.
Data Science by Chappuis Halder & Co.Data Science by Chappuis Halder & Co.
Data Science by Chappuis Halder & Co.
 
Making Advanced Analytics Work for You
Making Advanced Analytics Work for YouMaking Advanced Analytics Work for You
Making Advanced Analytics Work for You
 

More from Roger Barga

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRoger Barga
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentationRoger Barga
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8Roger Barga
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
Barga Data Science lecture 3
Barga Data Science lecture 3Barga Data Science lecture 3
Barga Data Science lecture 3Roger Barga
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 

More from Roger Barga (9)

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York City
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentation
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Barga Data Science lecture 3
Barga Data Science lecture 3Barga Data Science lecture 3
Barga Data Science lecture 3
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Data Driven Engineering 2014

  • 1. The Unrealized Power of Data Roger Barga GPM, CloudML C+E
  • 2.
  • 3. Big Data is Here Today Disruptive Innovation… http://bit.ly/5a9KL
  • 4. Competing on Analytics If you can predict it, you can own it… • Forecasting • Targeting • Fraud detection, anti-fraud analytics • Risk • Customer churn, conversion • Propensity • Price Elasticity By 2014, 30 percent of analytic applications will use predictive capabilities. –GartnerBusinessIntelligenceSummit2012
  • 5. Predictive Analytics in Action $3 Million in Health Analytics Prize Money Available http://bit.ly/groYXc Predictive Analytics Applied to the Hospital Billing Process http://bit.ly/gpX9ak Educational Data Mining is becoming popular http://bit.ly/fM0M1a New Model Predict Presence of Meth Labs Very Accurately http://bit.ly/aIgUGH Monitoring Physician Performance for Predictive Profiling http://bit.ly/bYt4sg Predicting Employee Success http://bit.ly/clox2R Forecast Career Paths by Resume Analytics http://bit.ly/gvlFKJ Thorotrends offers research/analytics for horse racing http://bit.ly/diinec How Target Figured Out A Teen Girl Was Pregnant http://onforb.es/SokL3j
  • 6. Data Science 65% of enterprises feel they have a strategic shortage of data scientists, a role many did not even know existed 12 months ago… Data Science is far too complex today • Software Development • Data Exploration • Math & Statistics Knowledge • Machine Learning • Data Management this must get simpler, it simply won’t scale!
  • 7. Talk Outline Predictive Analytics Today Predictive analytics can be a potent weapon in your business analytics toolbox With increasing commoditization it will truly be the next differentiator Advancement of Data Science Unique role and data science process; Applying the principles of scientific experimentation to business; Application of Predictive Analytics Within Microsoft Demo our internal Azure service for machine learning over data to build predictive models; Discuss three models built using Data Lab that we are using internally;
  • 8. Predictive Analytics “Trying to predict the future is like trying to drive down a country road at night with no lights while looking out the back window.” – Peter Drucker • We must seek out patterns among seemingly unconnected data and disciplines to gain valuable insights. • It took 50 years for organizations to fully leverage transactions systems, so it will time to realize significant value from knowledge, information and data. • What gets measured, gets done.
  • 11. Predictive Analytics Customer discussions, from the front lines By applying predictive analytics to data they already have, organizations are uncovering unexpected patterns and associations, and they are beginning to develop predictive models to guide front-line interactions.
  • 12. Predictive Analytics Customer discussions, from the front lines By applying predictive analytics to data they already have, organizations are uncovering unexpected patterns and associations, and they are beginning to develop predictive models to guide front-line interactions. Business Scenario: • Downtime due to equipment failure or unnecessary maintenance bears huge cost; • Goal: Predict failures from collected data and defer maintenance; Five years of data, time series data from hundreds of sensors on machines throughout the gas field, for dozens of gas fields across different regions of the world…
  • 13. Predictive Analytics Customer discussions, from the front lines By applying predictive analytics to data they already have, organizations are uncovering unexpected patterns and associations, and they are beginning to develop predictive models to guide front-line interactions. Business Scenario: • Downtime due to equipment failure or unnecessary maintenance bears huge cost; • Goal: predict failures from collected data and defer maintenance; Time series analysis to create features and evaluate various machine learning approaches…
  • 14. Predictive Analytics Customer discussions, from the front lines By applying predictive analytics to data they already have, organizations are uncovering unexpected patterns and associations, and they are beginning to develop predictive models to guide front-line interactions. Business Scenario: • Downtime due to equipment failure or unnecessary maintenance bears huge cost; • Goal: predict failures from collected data and defer maintenance; One-Class SVM
  • 15. Predictive Analytics Customer discussions, from the front lines By applying predictive analytics to data they already have, organizations are uncovering unexpected patterns and associations, and they are beginning to develop predictive models to guide front-line interactions. • Oil & Gas customer, building models for predictive maintenance of equipment; • Telecom customer, building customer churn predictive model to improve retention; Enterprises are beginning to realize that data is a strategic asset and that they need to start treating it like one... • If you don’t have or can’t find the data you need, consider creating it;
  • 16. Predictive Analytics Bridging the Gap Predictive analytics must be simpler, today it requires highly specialized skills; • Tools that their current staff and/or new hires can use for predictive modelling; • Ability to integrate predictive models in line of business applications;
  • 17. Predictive Analytics Bridging the Gap Predictive analytics must be simpler, today it requires highly specialized skills; • Tools that their current staff and/or new hires can use for predictive modelling; • Ability to integrate predictive models in line of business applications; Enterprises require an end-to-end solution for their predictive modelling • Support for collaboration, lineage tracking; • Archive for predictive models, model governance; • Support for search & discovery, reuse;
  • 18. Predictive Analytics Bridging the Gap Predictive analytics must be simpler, today it requires highly specialized skills; • Tools that their current staff and/or new hires can use for predictive modelling; • Ability to integrate predictive models in line of business applications; Enterprises require an end-to-end solution for their predictive modelling • Support for collaboration, lineage tracking; • Archive for predictive models, model governance; • Support for search & discovery, reuse; Predictions need to be operationalized, not locked in the backroom • Deployable predictions, minimize time to insight; • Ability to frequently update predictive model given new data, conditions change;
  • 19. Predictive Analytics Bridging the Gap Familiar algorithms; • Classification and Regression Trees (CART); • Linear Regression; • Decision Forests, Boosted Decision Trees, see Kaggle models of choice; • Logistic Regression or other discrete choice; • Association Rules; • K-nearest Neighbors; • Box Jenkins (ARIMA); • Support Vector Machines; • Deep Neural Networks, watch this one carefully… More data & better features, in the right hands even a simple algorithm can produce a very effective predictive model – enter the data scientist…
  • 20. Data Scientist A Relatively New Job Title DJ Patil and Jeff Hammerbacher Alternative Job Titles Signature Skills Math and Statistics Knowledge Machine Learning Programming Skills Computer Science Data Visualization Data Curiosity Emergence of Data Science Teams
  • 21. Data Science Discipline with Deep Academic Roots Bill Cleveland, Bell Labs Data Science Action Plan, 2001 The Data Science Journal, 2002 (CODATA) Journal of Data Science, 2002 National Science Foundation (NSF), 2005 Long-lived Digital Data Collections JISC, 2008 The Skills, Role & Career Structure of Data Scientists & Curators
  • 22. “We have found that creative data scientists can solve problems in every field better than experts in those fields can…” “The expertise you need is strategy in answering these questions.” Interview with Jeremy Howard, Chief Scientist, Kaggle: http://slate/me/UM0wQ7 Data Scientist A Critical Role
  • 23. Data Science Process Mind the Gap, the space between the data set and the algorithm Conventional descriptive analytics goes straight from a data set to applying an algorithm. But in Data Science there’s a huge space in between of very important stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard part. The hard part is data wrangling and doing it well. Handling missing values
  • 24. Data Science Process Mind the Gap, the space between the data set and the algorithm Conventional descriptive analytics goes straight from a data set to applying an algorithm. But in Data Science there’s a huge space in between of very important stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard part. The hard part is data wrangling and doing it well. Handling missing values Feature selection
  • 25. Data Science Process Mind the Gap, the space between the data set and the algorithm Conventional descriptive analytics goes straight from a data set to applying an algorithm. But in Data Science there’s a huge space in between of very important stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard part. The hard part is data wrangling and doing it well. Handling missing values Feature selection Feature creation and imputation
  • 26. Data Science Process Mind the Gap, the space between the data set and the algorithm Conventional descriptive analytics goes straight from a data set to applying an algorithm. But in Data Science there’s a huge space in between of very important stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard part. The hard part is data wrangling and doing it well. Handling missing values Feature selection Feature creation and imputation Joining with external data to enrich training set
  • 27. Data Science Process Requires Agile Experimentation Define Objective Access and Understand the Data Pre-processing Feature and/or Target construction 1. Define the objective and quantify it with a metric – optionally with constraints, if any. This typically requires domain knowledge. 2. Collect and understand the data, deal with vagaries and biases in data acquisition (missing data, outliers due to errors in data collection process, more sophisticated biases due to the data collection procedure, etc. 3. Frame the problem in terms of a ML problem – classification, regression, ranking, clustering, forecasting, outlier detection etc. – combination of domain knowledge and ML knowledge is useful. 4. Transform raw data into a “modeling dataset”, with features, targets etc., which can be used for modeling. Feature construction often improved with domain knowledge. Target must be identical or proxy of metric in step 1.
  • 28. Data Science Process Requires Agile Experimentation Feature selection Model training Model scoring Evaluation Train/ Test split 5. Train, test and evaluate, taking care to control bias/variance and ensure the metrics are reported with the right confidence intervals (cross-validation helps here), be vigilant against target leaks (which typically leads to unbelievably good test metrics) – this is the machine learning heavy step.
  • 29. Data Science Process Requires Agile Experimentation Define Objective Access and Understand the data Pre-processing Feature and/or Target construction Feature selection Model training Model scoring Evaluation Train/ Test split 6. Iterate steps (2) – (5) until the test metrics are satisfactory
  • 30. Data Science Process Support Algorithmic Creativity and Exploration • Neural Networks • Logistic Regression • Support Vector Machine • Decision Trees • Ensemble Methods • adaBoost • Bayesian Networks • Genetic Algorithms • Random Forest • Monte Carlo methods • Principal Component Analysis • Kalman Filter • Evolutionary Fuzzy Modelling • Deep Neural Networks
  • 31. Data Science Process Ability to Determine if a Model Will Predict With Accuracy... Overfitting has many faces, always measure performance on out-of-sample data. • At least one test, train on some of your data and test on the rest; • Do several, splitting the data differently each time; “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” George E.P. Box
  • 32.
  • 33. CloudML Studio Model Authoring Through Visual Composition • Accessible through a web browser • End2End support for data science workflow • Flexible and extensible
  • 34. CloudML Studio Foundational ML & Analytics Algorithms Associative rule mining Neural networks Regression analysis Clustering Random Forrest Support for extensibility Enable users to add their own algorithms as modules Mathematical programming Online analytical processing Bayesian Machine Learning Text analytics Monte Carlo methods Support for vector machines Boosted Decision tree learning Time series processing
  • 35. Rapid Experimentation to Create a Better Model • Train and test several models from existing library • Try a wide range of strategies • Extend existing models
  • 36. Immutable Model Repository Durable repository for predictive models and algorithms to build up the knowledge base and make CloudML Studio more valuable
  • 37.
  • 38. Data Lab Analyze SQL Azure database traces, predict rack capacity for planning purposes
  • 39. Data Lab Analyze SQL Azure database characterization model: bully, resource profiling, etc…
  • 40. Daimler AG – Predicting gap and flush defects during the automobile manufacturing process…
  • 41. Top 3 Obstacles to Using Predictive Analytics. 1. Too hard to use, can’t find skilled talent. 2. Cost of software, access to good algorithms. 3. Difficulty integrating into business processes
  • 42. ☁ Intelligent Web Service consume publish + Machine Learning Services on Azure enterprise customer data scientist ☁ CloudML Studio ☁• Automatically scale in response to actual usage, eliminate upfront costs for hardware resources. • Scored in batch-mode or request-response mode; • Actively monitor models in production to detect changes; • Telemetry and model management (rollback, retrain). CloudML Studio Publish as Azure Web Service(s)
  • 45.
  • 46. Closing Comments Data is a core strategic asset, it encodes your business’ collective experience • It is imperative to learn from your data, learn about your customers, products,… • As your org learns from its collective experience it will discover strategic insights …and can put this knowledge into action. Predictive Analytics capitalizes on all available data • Adds smarter decisions to systems, drives better outcomes, delivers greater value… • It requires specialized expertise, talent, and tools to execute well. Rise of the Data Scientist • Applying the principles of scientific experimentation to business processes; • Adept at interpretation & use of numeric data, appreciation for quantitative methods Better tooling for data science productivity and end-to-end management… Emerging literacies and requirements….
  • 47. Want to understand the what, why, and how of predictive analytics. Here’s a short, ordered reading list designed to get you up to speed super fast… The Signal And The Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver. Nate Silver built an innovative system for predicting baseball performance and predicted the 2008 and 2012 elections within a hair’s breadth. The New York Times publishes FiveThirtyEight.com, where Silver is one of the nation’s most influential political forecasters. Why read: This book will inspire your “predictive” imagination and ground you in the realities of predictive analytics. Predictive Analytics: The Power To Predict Who Will Click, Buy, Lie, or Die by Eric Siegel. Eric Siegel has written a very accessible book on how predictive analytics works. It’s chock-full of dozens of real-world examples, such as how Chase Bank predicted mortgage risk (before the recession), IBM Watson won Jeopardy!, and Hewlett-Packard predicted employee flight risk. Why read: Predictive Analytics is a perfectly paced explanation of how predictive analytics works and a repository of dozens of real examples across many use cases. Uncontrolled: The Surprising Payoff Of of Trial-and-Error For Business, Politics, and Society by Jim Manzi. Predictive analytics is not magic — it’s science. Jim Manzi reminds us of the power of the scientific method and reveals the shocking truth that many huge societal and business decisions are made based on misinterpreted data and statistics. Why read: Controlled experimentation amplifies the results of predictive analytics and helps avert the risk of inaccurate predictive models. Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank, and Mark Hall. If you write code or have a computer science background, you’ll probably want to know the gory details of how predictive analytics works. Why read: Data Mining provides a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations.