SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Best Practices for Big Data
Analytics with Machine
Learning

© 2013 Datameer, Inc. All rights reserved.
About our Speakers

Dr. Alex Guazzelli
Zementis Vice President, Analytics (@DrAlexGuazzelli)

Dr. Alex Guazzelli has co-authored the first book on PMML, the
Predictive Model Markup Language. At Zementis, Dr. Guazzelli is
responsible for developing core technology and analytical
solutions for Big Data and real-time scoring. Most recently, Dr.
Guazzelli started teaching a class on standards for predictive
analytics at UC San Diego Extension.
About our Speakers

Karen Hsu
Datameer Senior Director, Product Marketing (@Karenhsumar)

•  Over 15 years of enterprise software
experience

•  Co-authored 4 patents
•  Worked in a variety of engineering,
marketing and sales roles

•  Bachelors of Science degree in
Management Science and
Engineering from Stanford University

• 
• 
• 

Came from Infomatica
Worked with start-ups
Infomatica purchased to bring data
solutions to market
• 
Data quality
• 
Master data management 
• 
B2B
• 

Data security solutions
Agenda
•  Considerations
•  Best Practices
•  Demonstration
•  Q&A
Considerations

© 2013 Datameer, Inc. All rights reserved.
Considerations
Target Users
Business

IT 

Data 
Scientist

Questions

Descriptive! Predictive! Prescriptive!
Target Users
Business
Professional

▪  Visual

Dependencies
Clustering
Decision Trees

+ More!
Target Users
IT 

▪  Flexible, powerful
Target Users
Data 
Scientist

▪  Algorithms
▪  SAS, SPSS, R
Questions
Descriptive! Predictive! Prescriptive!

▪  Descriptive machine learning…
–  Tells you what has happened
Questions
Descriptive! Predictive! Prescriptive!

▪  Predictive machine learning…
–  Answers the question what will happen
Questions
Descriptive! Predictive! Prescriptive!

▪  Prescriptive machine learning…
–  What will happen, when it will happen, why
it will happen
–  Predict what will happen and prescribe how
to take advantage of this future
Best Practices

© 2013 Datameer, Inc. All rights reserved.
Lean Analytics

1. Integrate

Identify
Use Case

4. Visualize

2. Prepare
3. Analyze

Deploy
Union

Cleanse

Join

Bin

Normalize

Profile

Transform

Outliers

Missing Values

Invalid values

Data Preparation
Enrich
Descriptive Analytics

Drag & Drop Smart Analytics
Predictive Analytics
Predictive analytics is able to discover hidden patterns in historical data that the
human expert may not see. It is in fact the result of mathematics applied to data.
As such, it benefits from clever mathematical techniques as well as good data.

Predictive Analytics helps
you discover patterns in the
past, which can signal what
is ahead.

Descriptive vs. Predictive Analytics
" 
" 

Descriptive Analytics answers “What happened?”
Predictive Analytics answers “What will happen next?”

?
?
Example: Predicting Churn
Matt - Churned 2 days ago

Scott - “Liked” our company last week

John - ??
Churn-related features
Matt
3 complaints in last 6 months
Opened 2 support tickets in last 4 weeks
Spent a total of $1,234 buying merchandise
Spent a total of $123 in services
Purchased 2 items in last 4 weeks
Is 34 years old
Is a male
Lives in Los Angeles
...

Scott
No complaints in last 6 months
Opened 1 support ticket in last 4 weeks
Spent a total of $9,876 buying merchandise
Spent a total of $987 in services
Purchased 12 items in last 4 weeks
Is 54 years old
Is a male
Lives in Chicago
...
Big Data
An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
" 
" 
" 
" 
" 
" 
" 

Transaction records
Social media
Climate information
Mobile GPS signals
Healthcare
Smart Grid
Digital Breadcrumbs

Breadth and Depth

90% of the data today
created in last 2 years
Churn-related “Big Data” features
Matt
12 friends listed as customers
2 complaints from friends in last 6 months
Average age of friends is 41 years old
2 friends churned in last 30 days
No purchases for same items as friends
1 website visit in last 7 days
2 website pages opened during last visit
Opened 3 newsletters in last 6 months
...

Scott
34 friends listed as customers
1 complaint from friends in last 6 months
Average age of friends is 62 years old
No friends churned in last 30 days
Purchased same 2 items as friends in last 2 months
3 website visits in last 7 days
5 website pages opened during last visit
Opened 12 newsletters in last 6 months
...
Building a predictive model ...
Model Training
Predictive
Model

Churned
Not-churned

Churn-related
features

Neural Networks
Linear/Logistic Regression
Support Vector Machines
Scorecards
Decision Trees
Clustering
Association Rules
K-Nearest Neighbors
Naive Bayes Classifiers
...

Input
Layer

Data

Hidden
Layer

Output
Layer

Prediction
Why not several models?
Model Ensemble
Model 1

Raw Inputs

Data PreProcessing

Model 2

Prediction

.
.
.
Model n

Scores from all
models are
computed

Voting

Majority Voting,
Weighted Voting,
Weighted Average,
etc.
End Goal: Predicting churn ...

Model Deployment and Execution in
Big Data
Predictive
Churn
Model
Churn-related
Features

Churn
Risk
Score
From Model Building to Model Deployment
(Traditionally ...)

SAS, R, IBM
SPSS, Perl,
Python

Scientist’s
Desktop

Java, .NET
C, SQL

Lost in
Translation

SAS, R, IBM SPSS …

Production
Environment

Great for model building
but not for scoring, even
more so when it comes to
Hadoop
From Model Building to Model Deployment (with PMML)
Model Deployment
and Execution

Model Building
" 

Angoss

" 

BigML

" 

FICO Model Builder

" 

IBM SPSS

" 

KNIME

" 

KXEN

" 

Microstrategy

" 

Open Data

" 

Pervasive DataRush

" 

RapidMiner

" 

R / Rattle

" 

SAS

" 

SAP Business Objects

" 

Salford Systems

" 

StatSoft STASTISTICA

" 

SQL Server

" 

TIBCO Spotfire

" 

Custom Code, etc.

Datameer Server
PMML	
  
PMML	
  
PMML	
  
(models)	
  
(models)	
  
(models)	
  

PMML
Deploy in minutes ...

	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  

Universal	
  PMML	
  
Plug-­‐in	
  (UPPI)	
  
Predictive Model Markup Language
"   PMML is an XML-based language used to define statistical and data mining

models and to share these between compliant applications.
"   It is a mature standard developed by the DMG (Data Mining Group) to avoid

proprietary issues and incompatibilities and to deploy models.
"   PMML eliminates need for custom model deployment and ensures reliability.

Models

Data
Transformations

PMML defines a standard not only to represent data-mining
models, but also data handling and data transformations
(pre- and post-processing)
UPPI: Supported Techniques
"   Neural Networks (neural gas, radial-basis and backpropagation)
"   Support Vector Machines (for classification and regression)
"   Naive Bayes Classifier (for continuous and categorical inputs)
"   Rule Set Models
"   Clustering Models (2-step clustering, distribution and center-based)
"   Decision Trees (for classification and regression)
"   General Regression Models (Cox, General and Generalized Linear Models)
"   Regression Models (Linear, Logistic and Polynomial Regression Models)
"   Scorecards (with support for Reason Codes)
"   Restricted Boltzmann Machines
"   Association Rules
"   Multiple Models (with the possibility of having models spread over multiple PMML

files)
"   Model Ensemble (including Random Forest Models and Boosted Trees)
"   Model Segmentation
"   Model Chaining
"   Model Composition
"   Model Cascade

© Zementis, Inc. - Confidential
Demonstration Flow

Descriptive

Karen

Predictive
Modeling

Alex

Predictive
Production

Prescriptive

Karen

Karen
Descriptive Analytics

© 2013 Datameer, Inc. All rights reserved.
Descriptive Analytics
▪  Answers: What caused people to churn?
▪  Clustering
▪  Column Dependencies
▪  Decision Tree
Demonstration Flow

Descriptive

Karen

Predictive
Modeling

Alex

Predictive
Production

Prescriptive

Karen

Karen
Predictive Analytics

© 2013 Datameer, Inc. All rights reserved.
Predictive Analytics
▪  Who will churn?
Demonstration Flow

Descriptive

Karen

Predictive
Modeling

Alex

Predictive
Production

Prescriptive

Karen

Karen
Prescriptive Analytics

© 2013 Datameer, Inc. All rights reserved.
Prescriptive Analytics
▪  Who will churn? Why will they churn?
▪  What can we do to support that outcome?
Demonstration Flow

Descriptive

Karen

Predictive
Modeling

Alex

Predictive
Production

Prescriptive

Karen

Karen
Q&A
Next Steps:
More about Datameer and Big Data
www.datameer.com

More about Zementis
www.zementis.com

Contact us:
Alex Guazzeli aguazzeli@zementis.com 

Karen Hsu khsu@datameer.com 

Page 40

Más contenido relacionado

La actualidad más candente

Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 

La actualidad más candente (20)

Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
Introduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & DatabricksIntroduction to Machine Learning with Azure & Databricks
Introduction to Machine Learning with Azure & Databricks
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Finding fraud in large, diverse data sets
Finding fraud in large, diverse data setsFinding fraud in large, diverse data sets
Finding fraud in large, diverse data sets
 
Supply Chain Intelligence and Analytics Executive Guidelines for Success
Supply Chain Intelligence and Analytics Executive Guidelines for SuccessSupply Chain Intelligence and Analytics Executive Guidelines for Success
Supply Chain Intelligence and Analytics Executive Guidelines for Success
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Data leaders summit 2019
Data leaders summit 2019Data leaders summit 2019
Data leaders summit 2019
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Machine learning - What they don't teach you on Coursera ODSC London 2016
Machine learning - What they don't teach you on Coursera ODSC London 2016Machine learning - What they don't teach you on Coursera ODSC London 2016
Machine learning - What they don't teach you on Coursera ODSC London 2016
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 

Destacado

Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
templedf
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
MapR Technologies
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
Tug Boat Loading in Singapore
Tug Boat Loading in SingaporeTug Boat Loading in Singapore
Tug Boat Loading in Singapore
ravsinha
 

Destacado (20)

Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Fraud detection with Analytics wso2 asiacon2016
Fraud detection with Analytics wso2 asiacon2016 Fraud detection with Analytics wso2 asiacon2016
Fraud detection with Analytics wso2 asiacon2016
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage Lending
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
Payments Key Performance Indicators (KPIs): A Basic Perspective
Payments Key Performance Indicators (KPIs):  A Basic PerspectivePayments Key Performance Indicators (KPIs):  A Basic Perspective
Payments Key Performance Indicators (KPIs): A Basic Perspective
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
Intro au Big Data & Machine Learning
Intro au Big Data & Machine LearningIntro au Big Data & Machine Learning
Intro au Big Data & Machine Learning
 
EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Tug Boat Loading in Singapore
Tug Boat Loading in SingaporeTug Boat Loading in Singapore
Tug Boat Loading in Singapore
 
Oceans of Linked Data?
Oceans of Linked Data?Oceans of Linked Data?
Oceans of Linked Data?
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Research Vessel Data Management
Research Vessel Data ManagementResearch Vessel Data Management
Research Vessel Data Management
 

Similar a Best Practices for Big Data Analytics with Machine Learning by Datameer

Best practices machine learning final
Best practices machine learning finalBest practices machine learning final
Best practices machine learning final
Dianna Doan
 

Similar a Best Practices for Big Data Analytics with Machine Learning by Datameer (20)

Best practices machine learning final
Best practices machine learning finalBest practices machine learning final
Best practices machine learning final
 
Machine learning101 v1.2
Machine learning101 v1.2Machine learning101 v1.2
Machine learning101 v1.2
 
Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1Ml in a Day Workshop 5/1
Ml in a Day Workshop 5/1
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
 
Ml in a day v 1.1
Ml in a day v 1.1Ml in a day v 1.1
Ml in a day v 1.1
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Data science guide
Data science guideData science guide
Data science guide
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
Mohammed AL Madhani
Mohammed AL MadhaniMohammed AL Madhani
Mohammed AL Madhani
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22
 

Más de Datameer

How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
Datameer
 

Más de Datameer (14)

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Best Practices for Big Data Analytics with Machine Learning by Datameer

  • 1. Best Practices for Big Data Analytics with Machine Learning © 2013 Datameer, Inc. All rights reserved.
  • 2. About our Speakers Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli) Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.
  • 3. About our Speakers Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar) •  Over 15 years of enterprise software experience •  Co-authored 4 patents •  Worked in a variety of engineering, marketing and sales roles •  Bachelors of Science degree in Management Science and Engineering from Stanford University •  •  •  Came from Infomatica Worked with start-ups Infomatica purchased to bring data solutions to market •  Data quality •  Master data management •  B2B •  Data security solutions
  • 4. Agenda •  Considerations •  Best Practices •  Demonstration •  Q&A
  • 5. Considerations © 2013 Datameer, Inc. All rights reserved.
  • 6. Considerations Target Users Business IT Data Scientist Questions Descriptive! Predictive! Prescriptive!
  • 8. Target Users IT ▪  Flexible, powerful
  • 9. Target Users Data Scientist ▪  Algorithms ▪  SAS, SPSS, R
  • 10. Questions Descriptive! Predictive! Prescriptive! ▪  Descriptive machine learning… –  Tells you what has happened
  • 11. Questions Descriptive! Predictive! Prescriptive! ▪  Predictive machine learning… –  Answers the question what will happen
  • 12. Questions Descriptive! Predictive! Prescriptive! ▪  Prescriptive machine learning… –  What will happen, when it will happen, why it will happen –  Predict what will happen and prescribe how to take advantage of this future
  • 13. Best Practices © 2013 Datameer, Inc. All rights reserved.
  • 14. Lean Analytics 1. Integrate Identify Use Case 4. Visualize 2. Prepare 3. Analyze Deploy
  • 16. Descriptive Analytics Drag & Drop Smart Analytics
  • 17. Predictive Analytics Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data. Predictive Analytics helps you discover patterns in the past, which can signal what is ahead. Descriptive vs. Predictive Analytics "  "  Descriptive Analytics answers “What happened?” Predictive Analytics answers “What will happen next?” ? ?
  • 18. Example: Predicting Churn Matt - Churned 2 days ago Scott - “Liked” our company last week John - ??
  • 19. Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ... Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...
  • 20. Big Data An ever expanding ocean of data containing people and sensor data (lots and lots of it): "  "  "  "  "  "  "  Transaction records Social media Climate information Mobile GPS signals Healthcare Smart Grid Digital Breadcrumbs Breadth and Depth 90% of the data today created in last 2 years
  • 21. Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ... Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...
  • 22. Building a predictive model ... Model Training Predictive Model Churned Not-churned Churn-related features Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ... Input Layer Data Hidden Layer Output Layer Prediction
  • 23. Why not several models? Model Ensemble Model 1 Raw Inputs Data PreProcessing Model 2 Prediction . . . Model n Scores from all models are computed Voting Majority Voting, Weighted Voting, Weighted Average, etc.
  • 24. End Goal: Predicting churn ... Model Deployment and Execution in Big Data Predictive Churn Model Churn-related Features Churn Risk Score
  • 25. From Model Building to Model Deployment (Traditionally ...) SAS, R, IBM SPSS, Perl, Python Scientist’s Desktop Java, .NET C, SQL Lost in Translation SAS, R, IBM SPSS … Production Environment Great for model building but not for scoring, even more so when it comes to Hadoop
  • 26. From Model Building to Model Deployment (with PMML) Model Deployment and Execution Model Building "  Angoss "  BigML "  FICO Model Builder "  IBM SPSS "  KNIME "  KXEN "  Microstrategy "  Open Data "  Pervasive DataRush "  RapidMiner "  R / Rattle "  SAS "  SAP Business Objects "  Salford Systems "  StatSoft STASTISTICA "  SQL Server "  TIBCO Spotfire "  Custom Code, etc. Datameer Server PMML   PMML   PMML   (models)   (models)   (models)   PMML Deploy in minutes ...                 Universal  PMML   Plug-­‐in  (UPPI)  
  • 27. Predictive Model Markup Language "   PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. "   It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. "   PMML eliminates need for custom model deployment and ensures reliability. Models Data Transformations PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)
  • 28. UPPI: Supported Techniques "   Neural Networks (neural gas, radial-basis and backpropagation) "   Support Vector Machines (for classification and regression) "   Naive Bayes Classifier (for continuous and categorical inputs) "   Rule Set Models "   Clustering Models (2-step clustering, distribution and center-based) "   Decision Trees (for classification and regression) "   General Regression Models (Cox, General and Generalized Linear Models) "   Regression Models (Linear, Logistic and Polynomial Regression Models) "   Scorecards (with support for Reason Codes) "   Restricted Boltzmann Machines "   Association Rules "   Multiple Models (with the possibility of having models spread over multiple PMML files) "   Model Ensemble (including Random Forest Models and Boosted Trees) "   Model Segmentation "   Model Chaining "   Model Composition "   Model Cascade © Zementis, Inc. - Confidential
  • 30. Descriptive Analytics © 2013 Datameer, Inc. All rights reserved.
  • 31. Descriptive Analytics ▪  Answers: What caused people to churn? ▪  Clustering ▪  Column Dependencies ▪  Decision Tree
  • 33. Predictive Analytics © 2013 Datameer, Inc. All rights reserved.
  • 36. Prescriptive Analytics © 2013 Datameer, Inc. All rights reserved.
  • 37. Prescriptive Analytics ▪  Who will churn? Why will they churn? ▪  What can we do to support that outcome?
  • 39. Q&A
  • 40. Next Steps: More about Datameer and Big Data www.datameer.com More about Zementis www.zementis.com Contact us: Alex Guazzeli aguazzeli@zementis.com Karen Hsu khsu@datameer.com Page 40