2. About our Speakers
Dr. Alex Guazzelli
Zementis Vice President, Analytics (@DrAlexGuazzelli)
Dr. Alex Guazzelli has co-authored the first book on PMML, the
Predictive Model Markup Language. At Zementis, Dr. Guazzelli is
responsible for developing core technology and analytical
solutions for Big Data and real-time scoring. Most recently, Dr.
Guazzelli started teaching a class on standards for predictive
analytics at UC San Diego Extension.
3. About our Speakers
Karen Hsu
Datameer Senior Director, Product Marketing (@Karenhsumar)
•
Over 15 years of enterprise software
experience
•
•
•
Co-authored 4 patents
•
•
Bachelors of Science degree in
Management Science and Engineering
from Stanford University
Worked in a variety of engineering,
marketing and sales roles
•
Came from Infomatica
Worked with start-ups
Infomatica purchased to bring data
solutions to market
•
Data quality
•
Master data management
•
B2B
•
Data security solutions
17. Predictive Analytics
Predictive analytics is able to discover hidden patterns in historical data that the
human expert may not see. It is in fact the result of mathematics applied to data.
As such, it benefits from clever mathematical techniques as well as good data.
Predictive Analytics helps
you discover patterns in the
past, which can signal what
is ahead.
Descriptive vs. Predictive Analytics
Descriptive Analytics answers “What happened?”
Predictive Analytics answers “What will happen next?”
?
?
19. Churn-related features
Matt
3 complaints in last 6 months
Opened 2 support tickets in last 4 weeks
Spent a total of $1,234 buying merchandise
Spent a total of $123 in services
Purchased 2 items in last 4 weeks
Is 34 years old
Is a male
Lives in Los Angeles
...
Scott
No complaints in last 6 months
Opened 1 support ticket in last 4 weeks
Spent a total of $9,876 buying merchandise
Spent a total of $987 in services
Purchased 12 items in last 4 weeks
Is 54 years old
Is a male
Lives in Chicago
...
20. Big Data
An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
Transaction records
Social media
Climate information
Mobile GPS signals
Healthcare
Smart Grid
Digital Breadcrumbs
Breadth and Depth
90% of the data today
created in last 2 years
21. Churn-related “Big Data” features
Matt
12 friends listed as customers
2 complaints from friends in last 6 months
Average age of friends is 41 years old
2 friends churned in last 30 days
No purchases for same items as friends
1 website visit in last 7 days
2 website pages opened during last visit
Opened 3 newsletters in last 6 months
...
Scott
34 friends listed as customers
1 complaint from friends in last 6 months
Average age of friends is 62 years old
No friends churned in last 30 days
Purchased same 2 items as friends in last 2 months
3 website visits in last 7 days
5 website pages opened during last visit
Opened 12 newsletters in last 6 months
...
22. Building a predictive model ...
Model Training
Predictive
Model
Churned
Not-churned
Churn-related
features
Neural Networks
Linear/Logistic Regression
Support Vector Machines
Scorecards
Decision Trees
Clustering
Association Rules
K-Nearest Neighbors
Naive Bayes Classifiers
...
Input
Layer
Data
Hidden
Layer
Output
Layer
Prediction
23. Why not several models?
Model Ensemble
Model 1
Raw Inputs
Data PreProcessing
Model 2
Voting
Prediction
.
.
.
Model n
Scores from all
models are
computed
Majority Voting,
Weighted Voting,
Weighted Average,
etc.
24. End Goal: Predicting churn ...
Model Deployment and Execution in
Big Data
Predictive
Churn
Model
Churn-related
Features
Churn
Risk
Score
25. From Model Building to Model Deployment
(Traditionally ...)
SAS, R, IBM
SPSS, Perl,
Python
Scientist’s
Desktop
Java, .NET
C, SQL
Lost in
Translation
SAS, R, IBM SPSS …
Production
Environment
Great for model building
but not for scoring, even
more so when it comes
to Hadoop
26. From Model Building to Model Deployment (with
PMML)
Model Deployment
and Execution
Model Building
Angoss
BigML
FICO Model Builder
Datameer Server
IBM SPSS
KNIME
KXEN
Microstrategy
PMML
PMML
PMML
PMML
(models)
(models)
(models)
Open Data
Pervasive DataRush
Deploy in minutes ...
RapidMiner
R / Rattle
SAS
SAP Business Objects
Salford Systems
StatSoft STASTISTICA
SQL Server
TIBCO Spotfire
Custom Code, etc.
Universal PMML
Plug-in (UPPI)
27. Predictive Model Markup Language
PMML is an XML-based language used to define statistical and data mining
models and to share these between compliant applications.
It is a mature standard developed by the DMG (Data Mining Group) to avoid
proprietary issues and incompatibilities and to deploy models.
PMML eliminates need for custom model deployment and ensures reliability.
Models
Data
Transformations
PMML defines a standard not only to represent data-mining
models, but also data handling and data transformations
(pre- and post-processing)
40. Next Steps:
More about Datameer and Big Data
www.datameer.com
More about Zementis
www.zementis.com
Contact us:
Alex Guazzeli aguazzeli@zementis.com
Karen Hsu khsu@datameer.com
Page 40
Notas del editor
Before I go into the demonstrations, I want to orient you to the environment in which we’ll do this demonstration. Hortonworks sandbox, Datameer on topSee datameer (administration->hadoop cluster) and running on hadoop clusterSee administration in hortonworks (Pig, …)Go to job browser (take out hue from username) and see the jobs and that running Datameer jobs (point out maps and reduces)You can get all of this from the Hortonworks site and datameer.
Neural networks are known for having good prediction quality. But they’re bad in being understand and why the predicions are happening. But now we understand why neural network did to understand them better.