Don't forget! You can watch the full Datameer recording here:
http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html
Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.
2. About our Speakers
Dr. Alex Guazzelli
Zementis Vice President, Analytics (@DrAlexGuazzelli)
Dr. Alex Guazzelli has co-authored the first book on PMML, the
Predictive Model Markup Language. At Zementis, Dr. Guazzelli is
responsible for developing core technology and analytical
solutions for Big Data and real-time scoring. Most recently, Dr.
Guazzelli started teaching a class on standards for predictive
analytics at UC San Diego Extension.
3. About our Speakers
Karen Hsu
Datameer Senior Director, Product Marketing (@Karenhsumar)
• Over 15 years of enterprise software
experience
• Co-authored 4 patents
• Worked in a variety of engineering,
marketing and sales roles
• Bachelors of Science degree in
Management Science and
Engineering from Stanford University
•
•
•
Came from Infomatica
Worked with start-ups
Infomatica purchased to bring data
solutions to market
•
Data quality
•
Master data management
•
B2B
•
Data security solutions
12. Questions
Descriptive! Predictive! Prescriptive!
▪ Prescriptive machine learning…
– What will happen, when it will happen, why
it will happen
– Predict what will happen and prescribe how
to take advantage of this future
17. Predictive Analytics
Predictive analytics is able to discover hidden patterns in historical data that the
human expert may not see. It is in fact the result of mathematics applied to data.
As such, it benefits from clever mathematical techniques as well as good data.
Predictive Analytics helps
you discover patterns in the
past, which can signal what
is ahead.
Descriptive vs. Predictive Analytics
"
"
Descriptive Analytics answers “What happened?”
Predictive Analytics answers “What will happen next?”
?
?
19. Churn-related features
Matt
3 complaints in last 6 months
Opened 2 support tickets in last 4 weeks
Spent a total of $1,234 buying merchandise
Spent a total of $123 in services
Purchased 2 items in last 4 weeks
Is 34 years old
Is a male
Lives in Los Angeles
...
Scott
No complaints in last 6 months
Opened 1 support ticket in last 4 weeks
Spent a total of $9,876 buying merchandise
Spent a total of $987 in services
Purchased 12 items in last 4 weeks
Is 54 years old
Is a male
Lives in Chicago
...
20. Big Data
An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
"
"
"
"
"
"
"
Transaction records
Social media
Climate information
Mobile GPS signals
Healthcare
Smart Grid
Digital Breadcrumbs
Breadth and Depth
90% of the data today
created in last 2 years
21. Churn-related “Big Data” features
Matt
12 friends listed as customers
2 complaints from friends in last 6 months
Average age of friends is 41 years old
2 friends churned in last 30 days
No purchases for same items as friends
1 website visit in last 7 days
2 website pages opened during last visit
Opened 3 newsletters in last 6 months
...
Scott
34 friends listed as customers
1 complaint from friends in last 6 months
Average age of friends is 62 years old
No friends churned in last 30 days
Purchased same 2 items as friends in last 2 months
3 website visits in last 7 days
5 website pages opened during last visit
Opened 12 newsletters in last 6 months
...
22. Building a predictive model ...
Model Training
Predictive
Model
Churned
Not-churned
Churn-related
features
Neural Networks
Linear/Logistic Regression
Support Vector Machines
Scorecards
Decision Trees
Clustering
Association Rules
K-Nearest Neighbors
Naive Bayes Classifiers
...
Input
Layer
Data
Hidden
Layer
Output
Layer
Prediction
23. Why not several models?
Model Ensemble
Model 1
Raw Inputs
Data PreProcessing
Model 2
Prediction
.
.
.
Model n
Scores from all
models are
computed
Voting
Majority Voting,
Weighted Voting,
Weighted Average,
etc.
24. End Goal: Predicting churn ...
Model Deployment and Execution in
Big Data
Predictive
Churn
Model
Churn-related
Features
Churn
Risk
Score
25. From Model Building to Model Deployment
(Traditionally ...)
SAS, R, IBM
SPSS, Perl,
Python
Scientist’s
Desktop
Java, .NET
C, SQL
Lost in
Translation
SAS, R, IBM SPSS …
Production
Environment
Great for model building
but not for scoring, even
more so when it comes to
Hadoop
26. From Model Building to Model Deployment (with PMML)
Model Deployment
and Execution
Model Building
"
Angoss
"
BigML
"
FICO Model Builder
"
IBM SPSS
"
KNIME
"
KXEN
"
Microstrategy
"
Open Data
"
Pervasive DataRush
"
RapidMiner
"
R / Rattle
"
SAS
"
SAP Business Objects
"
Salford Systems
"
StatSoft STASTISTICA
"
SQL Server
"
TIBCO Spotfire
"
Custom Code, etc.
Datameer Server
PMML
PMML
PMML
(models)
(models)
(models)
PMML
Deploy in minutes ...
Universal
PMML
Plug-‐in
(UPPI)
27. Predictive Model Markup Language
" PMML is an XML-based language used to define statistical and data mining
models and to share these between compliant applications.
" It is a mature standard developed by the DMG (Data Mining Group) to avoid
proprietary issues and incompatibilities and to deploy models.
" PMML eliminates need for custom model deployment and ensures reliability.
Models
Data
Transformations
PMML defines a standard not only to represent data-mining
models, but also data handling and data transformations
(pre- and post-processing)
40. Next Steps:
More about Datameer and Big Data
www.datameer.com
More about Zementis
www.zementis.com
Contact us:
Alex Guazzeli aguazzeli@zementis.com
Karen Hsu khsu@datameer.com
Page 40