Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

A “Real-Time” Architecture for Machine Learning Execution with MLeap

134 visualizaciones

Publicado el

This talk describes a production environment that hosts a large random forest model on a cluster of MLeap runtimes. A microservice architecture with a Postgres database backend manages configuration. The architecture provides full traceability and model governance through the entire lifecycle while cutting execution time by nearly 2/3rds. Kount provides certainty in digital interactions like online credit card transactions. Our production environment has extreme requirements for availability: we can process hundreds of transactions per second, have no scheduled downtime, and achieve 99.99% annual uptime. One of our scores uses a random forest classifier with 250 trees and 100,000 nodes per tree. Our original implementation serialized a scikit-learn model, which itself takes 1 GB in memory. It required exactly identical environments in training, where the model was serialized, and production, where it was deserialized and evaluated. This is risky when maintaining high uptime and no planned downtime. The improved solution load balances across a cluster of API servers hosting MLeap runtimes. These model execution runtimes scale separately from the data pre-processing pipeline, which is the more expensive step in our application. Each pre-processing application is connected to multiple MLeap runtimes to provide complete redundancy and independent scaling. We extend model governance into the production environment using a set of services wrapped around a Postgres backend. These services manage model promotion and role across several production, QA, and integration environments. Finally, we describe a "shadow" pipeline in production that can replace any or all portions of transaction evaluation with alternative models and software. A Kafka message bus provides copies of live production transactions to the shadow servers where results are logged for analysis. Since this shadow environment is managed through the same services, code and models can be directly promoted or retired after being test run on live data streams.

Speaker: Noah Pritikin

Publicado en: Datos y análisis
  • Sé el primero en comentar

A “Real-Time” Architecture for Machine Learning Execution with MLeap

  1. 1. A “Real-Time” Architecture for Machine Learning Execution with MLeap Noah Pritikin, Site Reliability Engineer Spark+AI Summit 2019 | April 24, 2019
  2. 2. Machine Learning Applications Detecting credit-card fraud Financial markets Online advertising Recommender systems Robotics … Agriculture Automated medical diagnosis Computer vision Insurance Marketing Sentiment analysis User behavior analytics Weather forecasting … I am defining “Real-Time” as <100ms for the context of this presentation. Not “Real-Time” “Real-Time”
  3. 3. Agenda What is Kount? Data Pipeline Context “Real-Time” Architecture / Model Governance Statistical Metrics and Monitoring Q&A
  4. 4. What is Kount?
  5. 5. Fighting Fraud, Boosting Revenue Industry-Leading Technology & Experience Developing fraud-fighting technology since 1999 AI/Machine Learning Implemented in 2007 Dozens of Patented Technologies Continuous Innovation A SaaS-Based, All-in-One Fraud Mitigation Platform Safeguard Some of the World’s Largest Merchants Payment Service Providers Ecommerce Platforms $80M Investment from CVC Growth Partners
  6. 6. Data Pipeline Context
  7. 7. Data Pipeline Context Highly-available Client-facing Infrastructure / Services Kount Data Lake Data Science Magical Fairy Dust! Machine Learning Model (MLeap Pipeline) Machine Learning Execution Platform MLeap API Servers
  8. 8. “Real-Time” Architecture / Model Governance
  9. 9. First iteration was our baseline for improvement. We were faced with a technical problem to solve… Kount Boost Technology™ was released to production in October 2017. First iteration of the architecture based on Python3 / Scikit-learn worked, but… • Lacked portability • Challenging to scale into the future • Lacked multiple model support • Limited model governance Built in-house Apache Spark cluster in January 2018. • Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning model hyper parameters, etc.). Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
  10. 10. “Real-Time” Architecture Overview Feature Extraction separated from Transaction Prediction Hosting multiple models allow for blue- green deployments Centralized model governance Load balancer deployed in a “sidecar proxy” implementation allowing for simpler Feature Extraction instance design • Backend health checks make a prediction on a test transaction MLeap API instances run GC-optimized Java8 configuration JVM metrics (e.g. Jolokia, etc.)
  11. 11. Dark Production Infrastructure
  12. 12. Dark Production Infrastructure An entirely separate parallel infrastructure in production NO customer impact NO “real-time” requirements Parallelization is implemented via a message bus (e.g. Kafka, Kinesis, ZeroMQ, etc.) Optimize cost through only processing a fraction of production traffic (e.g. 1/3) Only logs raw predictions that are returned from MLeap for later analysis Dark production infrastructure enables model governance / validation.
  13. 13. Tools Enabling Model Governance Centrally track state of machine learning models – end-to-end! Train model & verify quality Add model to governance data store Deploy model to dark production infrastructure MLeap API instances Dark production infrastructure test? Bad Deploy to available production MLeap API instances Good Migrate production traffic to MLeap API instances hosting new model Unload retired model from MLeap API instances End Replaced model? No Yes
  14. 14. Statistical Metrics and Monitoring
  15. 15. “Real-Time” Architecture Performance – Transforming LEAP frames This is NOT machine learning model performance (e.g. TOC curve, ROC curve, PR curve, etc.) “Real-Time” system requires metrics to measure the systemic performance.
  16. 16. + Distributions! Due to “real-time” requirements, averages don’t cut it (by themselves…) Distributions provide critical visibility in monitoring low latency systems. Averages
  17. 17. Applied Statistics Boost without MLeap (previous) Boost with MLeap (current) Average 95th Percentile 99th Percentile Standard Deviation 19.27ms 24ms 37ms 5.31ms Average 95th Percentile 99th Percentile Standard Deviation 7.00ms 9ms 16ms 2.41ms – Improvement with MLeap! 99th percentile saw a ~56% improvement!
  18. 18. Consider Improvements to Your “Real-Time” Architecture! MLeap… Model governance… Dark Production Infrastructure (assisting with model testing)… Latency Metrics (emphasize the use of distributions)… Further reading… • “Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap” - https://medium.com/@combust/9e0fb57f79db • MLeap GitHub repo - https://github.com/combust/mleap • MLeap documentation - http://mleap-docs.combust.ml/
  19. 19. Thank you! … and, Q&A?

×