Más contenido relacionado
Similar a Agile deployment predictive analytics on hadoop (20)
Más de DataWorks Summit (20)
Agile deployment predictive analytics on hadoop
- 1. Agile Deployment of
Predictive Analytics on
Hadoop
Faster Insights through Open Standards
Hadoop Summit 2012
© 2012 Datameer, Inc. All rights reserved.
© 2012 Datameer, Inc. All rights reserved. Page 1
- 2. Today s Session
Ulrich Rueckert Michael Zeller
Data Scientist CEO
Datameer Zementis
After this session, you will be able to…
1. Effectively deliver predictive solutions combining:
a. R, KNIME & Others [Model Development]
b. Zementis Universal PMML Plug-in [Model Deployment & Execution]
c. Datameer [Scalable Hadoop Infrastructure]
2. Identify PMML as a vendor-neutral & open standard to:
a. Incorporate predictive models from virtually any commercial vendor or open source tool
b. Apply such models on Big Data
3. Leverage a lightweight, agile deployment process for predictive analytics to:
a. Accelerate time-to-market
b. Lower cost and complexity
c. Reuse existing predictive assets
© 2012 Datameer, Inc. All rights reserved. Page 2
- 3. Who is Datameer?
§ “Business Intelligence on top of Hadoop”
§ Established 2009 by Hadoop and enterprise software veterans
§ Offices in Silicon Valley, New York and Germany
§ Some customers:
© 2012 Datameer, Inc. All rights reserved. Page 3
- 4. Who is Zementis?
§ Focus on Operational Predictive Analytics
§ Offices in San Diego and Hong Kong
§ Predictive Analytics Software Technology:
• ADAPA® Decision Engine (Predictive Models and Rules)
• ADAPA Add-in for Excel
• PMML Converter
• Universal PMML Plug-in (UPPI)
§ Global Partner Network
© 2012 Datameer, Inc. All rights reserved. Page 4
- 5. Big Data and Analytics
§ People and Sensor Data
• Transaction records
• Social media
• Climate information 90% of the data today
created in the last 2 years
• Mobile GPS signals
• Healthcare
• Smart Grid
§ Benefits from Analytics
• Descriptive Analytics answers What happened?
• Predictive Analytics answers What will happen next?
© 2012 Datameer, Inc. All rights reserved. Page 5
- 6. Operational Predictive Analytics
Score Distribution
1st Lien Stand-Alone Loans
14% Goods
Bads
12%
Poly. (Goods)
Poly. (Bads)
% Within Class
10%
8%
6%
4%
2%
0%
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
% of Delinquent Loans per Month
Score
90
80
% of Delinquent Loans
70
700
60
750
50 800
40 850
900
30
950
20
10
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Months
© 2012 Datameer, Inc. All rights reserved. Page 6
- 7. From Model Building to Deployment
Model Building Model Deployment
Integration / Execution
Datameer Server
PMML
PMML
PMML
(models)
(models)
(models)
PMML
UPPI
Simple Deployment & Execution
1. Upload PMML file(s) in DAS
2. PMML turns into custom function
3. Seamlessly score data in Datameer
© 2012 Datameer, Inc. All rights reserved. Page 7
- 8. PMML
Predictive Model Markup Language
• PMML is an XML-based language used to define
statistical and data mining models and to share these
between compliant applications.
• Mature standard developed by the DMG (Data Mining
Group) to avoid proprietary issues and incompatibilities
and to deploy models.
Transformations
• Supported by all leading data mining tools, commercial
and open-source.
• Allows for the clear separation of tasks: Model
development vs. model deployment.
• Eliminates the need for custom code and proprietary
PMML book available on model deployment solutions.
Amazon.com
• Uniform deployment platform ensures scalability and
reliability of model execution.
© 2012 Datameer, Inc. All rights reserved. Page 8
- 9. PMML: Predictive Model Management
Integrating across all systems and processes
Business Process
PMML
IBM SmartCloud
Applications Amazon EC2
CRM, ERP, EXCEL, etc.
© 2012 Datameer, Inc. All rights reserved. Page 9
- 10. PMML: One Standard, One Process
Divisions
Service Providers
External Vendors
PMML
Applications
© 2012 Datameer, Inc. All rights reserved. Page 10
- 11. Demo Setup
§ End-to-end Model Development Lifecycle
§ PMML Standard as the Glue
Real-time Process
Understand
Improvement and ROI Model
Data Analysis Client s Data
Deployment
Universal
PMML
Plug-‐In
Development
Demonstrate Model Design Build Model(s) to
and Test
Model Performance Unlock Hidden Value
© 2012 Datameer, Inc. All rights reserved. Page 11
- 12. Demo: Annual Marketing Campaign
§ Which customers should we
target? 2011 2012
Campaign Customer
§ Split 2011 results in training Results List
and test set
§ Learn model on training set Subset for
Testing
§ Apply model on test set Fine-Tuned
Prediction
Model
§ Fine-tune model until Subset for Prediction
evaluation shows success Training Model
§ Apply final model on 2012
customer list Model
Evaluation
Campaign
Candidates
© 2012 Datameer, Inc. All rights reserved. Page 12
- 13. Summary
• Open Standards vs. • Minimize Data Movement • Leverage Datameer UI
Proprietary Code • Massively Parallel Execution • Deploy in Minutes vs. Months
• Best-of-Breed Tool Set • Scale with Business Demand • No Coding Skills Required
Avoid Vendor Ease of Use
Lock-in Hadoop-based Fast ROI
Scoring Paradigm
© 2012 Datameer, Inc. All rights reserved. Page 13
- 14. Online Resources
§ Learn More About PMML
§ Data Mining Group website http://www.dmg.org
§ Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634
§ Articles, on-line videos, blogs http://www.zementis.com/community.htm
§ Product Info
§ On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/
§ UPPI for Datameer http://www.zementis.com/DAS-plugin.htm
© 2012 Datameer, Inc. All rights reserved. Page 14