SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Agile Deployment of
                                    Predictive Analytics on
                                                   Hadoop

         Faster Insights through Open Standards
                                                           Hadoop Summit 2012



     © 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved.        Page 1
Today s Session

                      Ulrich Rueckert                                      Michael Zeller
                      Data Scientist                                       CEO
                      Datameer                                             Zementis



    After this session, you will be able to…

    1.  Effectively deliver predictive solutions combining:
             a.  R, KNIME & Others               [Model Development]
             b.  Zementis Universal PMML Plug-in [Model Deployment & Execution]
             c.  Datameer                        [Scalable Hadoop Infrastructure]

    2.  Identify PMML as a vendor-neutral & open standard to:
             a.  Incorporate predictive models from virtually any commercial vendor or open source tool
             b.  Apply such models on Big Data

    3.  Leverage a lightweight, agile deployment process for predictive analytics to:
             a.  Accelerate time-to-market
             b.  Lower cost and complexity
             c.  Reuse existing predictive assets

© 2012 Datameer, Inc. All rights reserved.          Page 2
Who is Datameer?

     §  “Business Intelligence on top of Hadoop”
     §  Established 2009 by Hadoop and enterprise software veterans
     §  Offices in Silicon Valley, New York and Germany




     §  Some customers:




© 2012 Datameer, Inc. All rights reserved.   Page 3
Who is Zementis?

     §  Focus on Operational Predictive Analytics
     §  Offices in San Diego and Hong Kong
     §  Predictive Analytics Software Technology:
              •    ADAPA® Decision Engine (Predictive Models and Rules)
              •    ADAPA Add-in for Excel
              •    PMML Converter
              •    Universal PMML Plug-in (UPPI)


     §  Global Partner Network




© 2012 Datameer, Inc. All rights reserved.      Page 4
Big Data and Analytics


        §  People and Sensor Data
             •  Transaction records
             •  Social media
             •  Climate information                   90% of the data today
                                                      created in the last 2 years
             •  Mobile GPS signals
             •  Healthcare
             •  Smart Grid

        §  Benefits from Analytics
             •  Descriptive Analytics answers What happened?
             •  Predictive Analytics answers What will happen next?


© 2012 Datameer, Inc. All rights reserved.   Page 5
Operational Predictive Analytics

                                                                                                               Score Distribution
                                                                                                         1st Lien Stand-Alone Loans

                                                                    14%                              Goods
                                                                                                     Bads
                                                                    12%
                                                                                                     Poly. (Goods)
                                                                                                     Poly. (Bads)
                                                   % Within Class




                                                                    10%

                                                                    8%

                                                                    6%

                                                                    4%

                                                                    2%

                                                                    0%
                                                                           50

                                                                                100

                                                                                      150

                                                                                            200

                                                                                                   250

                                                                                                          300

                                                                                                                350

                                                                                                                      400

                                                                                                                            450

                                                                                                                                  500

                                                                                                                                        550

                                                                                                                                              600

                                                                                                                                                     650

                                                                                                                                                           700

                                                                                                                                                                 750

                                                                                                                                                                       800

                                                                                                                                                                             850

                                                                                                                                                                                   900

                                                                                                                                                                                         950

                                                                                                                                                                                               1000
                                                      % of Delinquent Loans per Month
                                                                                                                              Score
                                      90

                                      80
              % of Delinquent Loans




                                      70
                                                                                                                                               700
                                      60
                                                                                                                                               750
                                      50                                                                                                       800
                                      40                                                                                                       850
                                                                                                                                               900
                                      30
                                                                                                                                               950
                                      20

                                      10

                                      0
                                       Jan   Feb      Mar            Apr    May       Jun    Jul         Aug    Sep     Oct       Nov

                                                                                  Months




© 2012 Datameer, Inc. All rights reserved.                                                                                                    Page 6
From Model Building to Deployment

              Model Building                                     Model Deployment
                                                               Integration / Execution



                                                                      Datameer Server
                                                               	
  
                                                               	
          PMML	
  
                                                                            PMML	
  
                                                                             PMML	
  
                                                                          (models)	
  
                                                               	
          (models)	
  
                                                                            (models)	
  
                                             PMML
                                                	
             	
  
                                                               	
  
                                                               	
           UPPI	
  
                                                               	
  
                                                               	
  


                                                          Simple Deployment & Execution
                                                          1.  Upload PMML file(s) in DAS
                                                          2.  PMML turns into custom function
                                                          3.  Seamlessly score data in Datameer

© 2012 Datameer, Inc. All rights reserved.       Page 7
PMML
Predictive Model Markup Language



                                             •  PMML is an XML-based language used to define
                                             statistical and data mining models and to share these
                                             between compliant applications.

                                             •  Mature standard developed by the DMG (Data Mining
                                             Group) to avoid proprietary issues and incompatibilities
                                             and to deploy models.
 Transformations
                                             •  Supported by all leading data mining tools, commercial
                                             and open-source.

                                             •  Allows for the clear separation of tasks: Model
                                             development vs. model deployment.

                                             •  Eliminates the need for custom code and proprietary
      PMML book available on                 model deployment solutions.
          Amazon.com
                                             •  Uniform deployment platform ensures scalability and
                                             reliability of model execution.
© 2012 Datameer, Inc. All rights reserved.        Page 8
PMML: Predictive Model Management
  Integrating across all systems and processes



            Business Process




                                             PMML


                                                      IBM SmartCloud
         Applications                                 Amazon EC2
         CRM, ERP, EXCEL, etc.


© 2012 Datameer, Inc. All rights reserved.   Page 9
PMML: One Standard, One Process


                                                  Divisions



      Service Providers
                                                                 External Vendors




                                                       PMML




                                             Applications
© 2012 Datameer, Inc. All rights reserved.             Page 10
Demo Setup

    §  End-to-end Model Development Lifecycle
    §  PMML Standard as the Glue

Real-time Process
                                                                                                Understand
Improvement and ROI                             Model
                                                                                Data Analysis   Client s Data
                                              Deployment




                                                     Universal	
  
                                                      PMML	
  	
  
                                                      Plug-­‐In	
  


                                              Development
Demonstrate                                                                     Model Design    Build Model(s) to
                                                and Test
Model Performance                                                                               Unlock Hidden Value


 © 2012 Datameer, Inc. All rights reserved.                           Page 11
Demo: Annual Marketing Campaign

   §  Which customers should we
       target?                                                 2011                    2012
                                                             Campaign                Customer
   §  Split 2011 results in training                         Results                   List


       and test set
   §  Learn model on training set                                      Subset for
                                                                         Testing

   §  Apply model on test set                                                       Fine-Tuned
                                                                                      Prediction
                                                                                        Model
   §  Fine-tune model until                           Subset for       Prediction

       evaluation shows success                         Training          Model



   §  Apply final model on 2012
       customer list                                                      Model
                                                                        Evaluation
                                                                                     Campaign
                                                                                     Candidates




© 2012 Datameer, Inc. All rights reserved.   Page 12
Summary


•      Open Standards vs.                    •    Minimize Data Movement         •    Leverage Datameer UI
       Proprietary Code                      •    Massively Parallel Execution   •    Deploy in Minutes vs. Months
•      Best-of-Breed Tool Set                •    Scale with Business Demand     •    No Coding Skills Required




      Avoid Vendor                                                                     Ease of Use
        Lock-in                                    Hadoop-based                         Fast ROI
                                                  Scoring Paradigm
© 2012 Datameer, Inc. All rights reserved.                 Page 13
Online Resources




 §  Learn More About PMML
 §     Data Mining Group website                                 http://www.dmg.org
 §     Join LinkedIn PMML Discussion Group                       http://www.linkedin.com/groupRegistration?gid=2328634
 §     Articles, on-line videos, blogs                           http://www.zementis.com/community.htm



 §  Product Info
 §     On Demand Webinar                    http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/

 §     UPPI for Datameer                    http://www.zementis.com/DAS-plugin.htm



© 2012 Datameer, Inc. All rights reserved.                  Page 14

Más contenido relacionado

Destacado

Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Media Gorod
 
Big Data Analytics in Healthcare and Life Sciences
Big Data Analytics in Healthcare and Life SciencesBig Data Analytics in Healthcare and Life Sciences
Big Data Analytics in Healthcare and Life Sciences
Ali Sanousi, MD, MBA, PhD
 

Destacado (20)

Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and Hadoop
 
Deploying Data Science with Docker and AWS
Deploying Data Science with Docker and AWSDeploying Data Science with Docker and AWS
Deploying Data Science with Docker and AWS
 
A Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentViewA Short PMML Tutorial by LatentView
A Short PMML Tutorial by LatentView
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
PMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive SolutionsPMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive Solutions
 
Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0
Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0
Geospatial Toolkit Enhancements for IBM InfoSphere Streams V4.0
 
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in MotionInfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data science
 
Docker for data science
Docker for data scienceDocker for data science
Docker for data science
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Geber Consulting - Big Data in Healthcare
Geber Consulting - Big Data in Healthcare Geber Consulting - Big Data in Healthcare
Geber Consulting - Big Data in Healthcare
 
IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...
IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...
IBM Insight 2014 session (4152 )- Accelerating Insights in Healthcare with “B...
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
 
Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)
Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)
Hospital Readmission Reduction: How Important are Follow Up Calls? (Hint: Very)
 
Healthcare Analytics Maturity Model
Healthcare Analytics Maturity ModelHealthcare Analytics Maturity Model
Healthcare Analytics Maturity Model
 
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
Healthcare Predictive Analytics with the OR-(Denny Lee and Ayad Shammout, Dat...
 
Big Data Analytics in Healthcare and Life Sciences
Big Data Analytics in Healthcare and Life SciencesBig Data Analytics in Healthcare and Life Sciences
Big Data Analytics in Healthcare and Life Sciences
 
Predicting Hospital Readmission Using Cascading
Predicting Hospital Readmission Using CascadingPredicting Hospital Readmission Using Cascading
Predicting Hospital Readmission Using Cascading
 

Similar a Agile deployment predictive analytics on hadoop

Managing a Website Performance Optimization (WPO) Project
Managing a Website Performance Optimization (WPO) ProjectManaging a Website Performance Optimization (WPO) Project
Managing a Website Performance Optimization (WPO) Project
Yottaa
 
E bootcamp right track manufacturing solutions 3 min pitch 04172013
E bootcamp right track manufacturing solutions 3 min pitch 04172013E bootcamp right track manufacturing solutions 3 min pitch 04172013
E bootcamp right track manufacturing solutions 3 min pitch 04172013
Sola Lawal
 
Business Services as a Resource to Business - Kristina Harrell
Business Services as a Resource to Business - Kristina HarrellBusiness Services as a Resource to Business - Kristina Harrell
Business Services as a Resource to Business - Kristina Harrell
PAPartners
 
Embraer day 2011_ny_ds(1)
Embraer day 2011_ny_ds(1)Embraer day 2011_ny_ds(1)
Embraer day 2011_ny_ds(1)
Embraer RI
 
Embraer Day NY 2011 - Defense and Security
Embraer Day NY 2011 - Defense and SecurityEmbraer Day NY 2011 - Defense and Security
Embraer Day NY 2011 - Defense and Security
Embraer RI
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetup
Norm Leitman
 
Sapphire Online 2009 Or1005
Sapphire Online 2009 Or1005Sapphire Online 2009 Or1005
Sapphire Online 2009 Or1005
Shereen Zubair
 
metlife Investor Day 2008 Investments
metlife Investor Day 2008 Investmentsmetlife Investor Day 2008 Investments
metlife Investor Day 2008 Investments
finance5
 
Measuring interaction in digital publications
Measuring interaction in digital publicationsMeasuring interaction in digital publications
Measuring interaction in digital publications
WAN-IFRA
 

Similar a Agile deployment predictive analytics on hadoop (20)

Managing a Website Performance Optimization (WPO) Project
Managing a Website Performance Optimization (WPO) ProjectManaging a Website Performance Optimization (WPO) Project
Managing a Website Performance Optimization (WPO) Project
 
Bango WiFi Market Data 1 Q09
Bango WiFi Market Data 1 Q09Bango WiFi Market Data 1 Q09
Bango WiFi Market Data 1 Q09
 
E bootcamp right track manufacturing solutions 3 min pitch 04172013
E bootcamp right track manufacturing solutions 3 min pitch 04172013E bootcamp right track manufacturing solutions 3 min pitch 04172013
E bootcamp right track manufacturing solutions 3 min pitch 04172013
 
"So – are we getting better?”
"So – are we getting better?”"So – are we getting better?”
"So – are we getting better?”
 
Business Services as a Resource to Business - Kristina Harrell
Business Services as a Resource to Business - Kristina HarrellBusiness Services as a Resource to Business - Kristina Harrell
Business Services as a Resource to Business - Kristina Harrell
 
Embraer day 2011_ny_ds(1)
Embraer day 2011_ny_ds(1)Embraer day 2011_ny_ds(1)
Embraer day 2011_ny_ds(1)
 
Embraer Day NY 2011 - Defense and Security
Embraer Day NY 2011 - Defense and SecurityEmbraer Day NY 2011 - Defense and Security
Embraer Day NY 2011 - Defense and Security
 
Smaato - NOAH12 San Francisco
Smaato - NOAH12 San FranciscoSmaato - NOAH12 San Francisco
Smaato - NOAH12 San Francisco
 
3 things to start this afternoon to improve your paid search
3 things to start this afternoon to improve your paid search3 things to start this afternoon to improve your paid search
3 things to start this afternoon to improve your paid search
 
The DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetupThe DevOps PaaS Infusion - May meetup
The DevOps PaaS Infusion - May meetup
 
Cloudify summit2012 pub
Cloudify summit2012 pubCloudify summit2012 pub
Cloudify summit2012 pub
 
Driving a High Performance Culture
Driving a High Performance CultureDriving a High Performance Culture
Driving a High Performance Culture
 
Ashnik corporate presentation Dec 2012
Ashnik corporate presentation Dec 2012Ashnik corporate presentation Dec 2012
Ashnik corporate presentation Dec 2012
 
Sapphire Online 2009 Or1005
Sapphire Online 2009 Or1005Sapphire Online 2009 Or1005
Sapphire Online 2009 Or1005
 
How To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics Cloud
How To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics CloudHow To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics Cloud
How To Convert Your SAP BusinessObjects Unused Licenses To SAP Analytics Cloud
 
metlife Investor Day 2008 Investments
metlife Investor Day 2008 Investmentsmetlife Investor Day 2008 Investments
metlife Investor Day 2008 Investments
 
New IDC Research on Software Analysis & Measurement
New IDC Research on Software Analysis & MeasurementNew IDC Research on Software Analysis & Measurement
New IDC Research on Software Analysis & Measurement
 
Measuring interaction in digital publications
Measuring interaction in digital publicationsMeasuring interaction in digital publications
Measuring interaction in digital publications
 
Solarmer Energy Profile
Solarmer Energy ProfileSolarmer Energy Profile
Solarmer Energy Profile
 
The dark side of IoT
The dark side of IoT The dark side of IoT
The dark side of IoT
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Agile deployment predictive analytics on hadoop

  • 1. Agile Deployment of Predictive Analytics on Hadoop Faster Insights through Open Standards Hadoop Summit 2012 © 2012 Datameer, Inc. All rights reserved. © 2012 Datameer, Inc. All rights reserved. Page 1
  • 2. Today s Session Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis After this session, you will be able to… 1.  Effectively deliver predictive solutions combining: a.  R, KNIME & Others [Model Development] b.  Zementis Universal PMML Plug-in [Model Deployment & Execution] c.  Datameer [Scalable Hadoop Infrastructure] 2.  Identify PMML as a vendor-neutral & open standard to: a.  Incorporate predictive models from virtually any commercial vendor or open source tool b.  Apply such models on Big Data 3.  Leverage a lightweight, agile deployment process for predictive analytics to: a.  Accelerate time-to-market b.  Lower cost and complexity c.  Reuse existing predictive assets © 2012 Datameer, Inc. All rights reserved. Page 2
  • 3. Who is Datameer? §  “Business Intelligence on top of Hadoop” §  Established 2009 by Hadoop and enterprise software veterans §  Offices in Silicon Valley, New York and Germany §  Some customers: © 2012 Datameer, Inc. All rights reserved. Page 3
  • 4. Who is Zementis? §  Focus on Operational Predictive Analytics §  Offices in San Diego and Hong Kong §  Predictive Analytics Software Technology: •  ADAPA® Decision Engine (Predictive Models and Rules) •  ADAPA Add-in for Excel •  PMML Converter •  Universal PMML Plug-in (UPPI) §  Global Partner Network © 2012 Datameer, Inc. All rights reserved. Page 4
  • 5. Big Data and Analytics §  People and Sensor Data •  Transaction records •  Social media •  Climate information 90% of the data today created in the last 2 years •  Mobile GPS signals •  Healthcare •  Smart Grid §  Benefits from Analytics •  Descriptive Analytics answers What happened? •  Predictive Analytics answers What will happen next? © 2012 Datameer, Inc. All rights reserved. Page 5
  • 6. Operational Predictive Analytics Score Distribution 1st Lien Stand-Alone Loans 14% Goods Bads 12% Poly. (Goods) Poly. (Bads) % Within Class 10% 8% 6% 4% 2% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 % of Delinquent Loans per Month Score 90 80 % of Delinquent Loans 70 700 60 750 50 800 40 850 900 30 950 20 10 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Months © 2012 Datameer, Inc. All rights reserved. Page 6
  • 7. From Model Building to Deployment Model Building Model Deployment Integration / Execution Datameer Server     PMML   PMML   PMML   (models)     (models)   (models)   PMML         UPPI       Simple Deployment & Execution 1.  Upload PMML file(s) in DAS 2.  PMML turns into custom function 3.  Seamlessly score data in Datameer © 2012 Datameer, Inc. All rights reserved. Page 7
  • 8. PMML Predictive Model Markup Language •  PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. •  Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. Transformations •  Supported by all leading data mining tools, commercial and open-source. •  Allows for the clear separation of tasks: Model development vs. model deployment. •  Eliminates the need for custom code and proprietary PMML book available on model deployment solutions. Amazon.com •  Uniform deployment platform ensures scalability and reliability of model execution. © 2012 Datameer, Inc. All rights reserved. Page 8
  • 9. PMML: Predictive Model Management Integrating across all systems and processes Business Process PMML IBM SmartCloud Applications Amazon EC2 CRM, ERP, EXCEL, etc. © 2012 Datameer, Inc. All rights reserved. Page 9
  • 10. PMML: One Standard, One Process Divisions Service Providers External Vendors PMML Applications © 2012 Datameer, Inc. All rights reserved. Page 10
  • 11. Demo Setup §  End-to-end Model Development Lifecycle §  PMML Standard as the Glue Real-time Process Understand Improvement and ROI Model Data Analysis Client s Data Deployment Universal   PMML     Plug-­‐In   Development Demonstrate Model Design Build Model(s) to and Test Model Performance Unlock Hidden Value © 2012 Datameer, Inc. All rights reserved. Page 11
  • 12. Demo: Annual Marketing Campaign §  Which customers should we target? 2011 2012 Campaign Customer §  Split 2011 results in training Results List and test set §  Learn model on training set Subset for Testing §  Apply model on test set Fine-Tuned Prediction Model §  Fine-tune model until Subset for Prediction evaluation shows success Training Model §  Apply final model on 2012 customer list Model Evaluation Campaign Candidates © 2012 Datameer, Inc. All rights reserved. Page 12
  • 13. Summary •  Open Standards vs. •  Minimize Data Movement •  Leverage Datameer UI Proprietary Code •  Massively Parallel Execution •  Deploy in Minutes vs. Months •  Best-of-Breed Tool Set •  Scale with Business Demand •  No Coding Skills Required Avoid Vendor Ease of Use Lock-in Hadoop-based Fast ROI Scoring Paradigm © 2012 Datameer, Inc. All rights reserved. Page 13
  • 14. Online Resources §  Learn More About PMML §  Data Mining Group website http://www.dmg.org §  Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634 §  Articles, on-line videos, blogs http://www.zementis.com/community.htm §  Product Info §  On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/ §  UPPI for Datameer http://www.zementis.com/DAS-plugin.htm © 2012 Datameer, Inc. All rights reserved. Page 14