Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief Data Scientist, BeeswaxIO

2.080 visualizaciones

Publicado el

Presented at #H2OWorld 2017 in Mountain View, CA.

Enjoy the video: https://youtu.be/-rGRHrED94Y.

Learn more about H2O.ai: https://www.h2o.ai/.

Follow @h2oai: https://twitter.com/h2oai.

- - -

Abstract:
Most machine learning systems enable two essential processes: creating a model and applying the model in a repeatable and controlled fashion. These two processes are interrelated and pose technological and organizational challenges as they evolve from research to prototype to production. This presentation outlines common design patterns for tackling such challenges while implementing machine learning in a production environment.

Sergei's Bio:
Dr. Sergei Izrailev is Chief Data Scientist at BeeswaxIO, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development and scaling of data science based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery. Sergei holds a Ph.D. in Physics and Master of Computer Science degrees from the University of Illinois at Urbana-Champaign.

Publicado en: Tecnología
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief Data Scientist, BeeswaxIO

  1. 1. Chief Data Scientist @ Beeswax @sizrailev / bit.ly/MLatScale / sergei@beeswax.com / www.beeswax.com Sergei Izrailev
  2. 2. Design Patterns for Machine Learning in Production
  3. 3. Motivation • A widespread need to leverage in-house data • Data science expertise is available • And yet, it seems to be too hard to extract value from ML
  4. 4. • Beeswax • Beeswax is an ad tech startup; 40 employees in NYC and London • Founded by three ex-Googlers • Real-time bidding (RTB) platform for buying online ads (1M+ QPS) • Platform tailored for customers to leverage in-house data science • Myself • Production AI systems in Pharma, Finance and Ad Tech • Interested in both technology and organizations • bit.ly/MLatScale About
  5. 5. Overall process Discovery Research Prototype Production Value Cost Problem Statement Constraints
  6. 6. Start with defining the problem
  7. 7. Problem statement • Is this the right problem to solve? • Suppose, we’ve solved the stated problem - what’s the value? • Is ML the right tool to solve the problem? • What are the constraints?
  8. 8. Define constraints • Existing production environment architecture • Technology stack • Available people and their skills • Requirements for scale
  9. 9. Dimensions of ML system scalability • Volume: how much data do we need to process? • Velocity: how quickly does the data change? • Variety: what are the types of data, models, and applications? • Veracity: how accurate are our models? • Value: how does it matter to the ML consumer? • Viability: do the benefits outweigh the costs?
  10. 10. Technical Design of ML Systems
  11. 11. Machine learning systems ApplyBuild Model Data Sources Labeled data Unlabeled data Consumer
  12. 12. ML system design Build Apply Model Data Sources Labeled data Unlabeled data Consumer ? ? ?? ? ? ?
  13. 13. Model deployment Data Sources Labeled data Unlabeled data ? ? Consumer ? ? Build Model ? ? Apply ?
  14. 14. Model deployment Build Trans- form Train Code + Data Apply Predict Trans- form • Data transformations must be the same in training and scoring • Some transformations are “models” (PCA, top N, TF-IDF) • Hence, most ML pipelines are DAGs • These DAGs must be reproduced in production scoring
  15. 15. Interface between building and scoring • In-memory - model is never persisted, train then score • single application, also streaming • Data only - linear coefficients, PMML, etc • code is independent • Serialized objects - Pickle, R, Spark, custom • reuse code • Code + Data - e.g., H2O’s POJO • code is generated
  16. 16. Scoring systems Build Model Data Sources Labeled data Unlabeled data ? ? ? ? Apply Consumer ?? ?
  17. 17. Batch processing ConsumerApply Predictions Predictions New data Trans- form Predict
  18. 18. Batch features; consumer predicts New data ConsumerApply Model Features Predict Features Trans- form
  19. 19. Single-row predictions Consumer Model Predict Trans- form New data
  20. 20. A service for a single row consumer ConsumerApply New data Prediction New data Trans- form Predict
  21. 21. A service for a single row consumer ConsumerApply New data Prediction New data Trans- form Predict Cache
  22. 22. Cached predictions ConsumerApply Cache Predictions PredictionKey Default New data Trans- form Predict New data Predictions
  23. 23. Near-real-time ConsumerApply Cache Predictions PredictionKey Default New data Trans- form Predict Add data Predictions New data
  24. 24. Cached features ConsumerApply Cache Features FeaturesKey Model Features Predict New data Trans- form New data
  25. 25. Evolution of ML systems
  26. 26. Research Build Sample Transform & generate features Collect labels Train & Test Extract & Clean
  27. 27. Prototype Build Sample Transform generate features Collect labels Train & Test Extract & Clean Apply Transform Generate features Predict Evaluate Extract & Clean Serialized Model
  28. 28. Production Build Apply Serialized Model Visualization CI/CD Monitoring & Alerts Model Evaluation Change Mgmt Error Handling Data Validation Versioning Security & Access Reporting Testing & QA Logging
  29. 29. Production Fault-tolerant systems • We should expect problems • models don’t converge; input data changes; bugs • Produce acceptable results - even when something fails • Handle predictable error conditions automatically • Minimal human intervention • Easy diagnostics and recovery in case of fatal errors
  30. 30. Conway's law: "Any piece of software reflects the organizational structure that produced it."
  31. 31. People Questions • Who is developing training? • Who is developing scoring? • Who is responsible for training in production? • Who is responsible for scoring in production? • Who deploys new models and model updates? • Who is responsible for quality control?
  32. 32. Product management Data science Data engineering Application engineering (RT server-side applications, client- side applications) UX People’s Functions • Front-end development • Data collection (e.g., logging) • Code deployment • Testing and QA • Infrastructure provisioning • Support
  33. 33. Data scientists Data scientists as consumers Apply ConsumerBuild Serialized Model
  34. 34. EngineersData scientists Over the wall Build Apply Serialized Model Build Apply Serialized Model Consumer
  35. 35. EngineersData scientists Deliver predictions only (aka “black box”) Build Apply Serialized Model Consumer Predictions
  36. 36. EngineersData scientists Deliver data (serialized model) Apply ConsumerBuild Serialized Model Scoring Code
  37. 37. EngineersData scientists Deliver code and data Apply ConsumerBuild Serialized Model Scoring Code
  38. 38. Team: Data Scientists + Engineers Cross-functional team Apply ConsumerBuild Serialized Model Scoring Code
  39. 39. EngineersData scientists ML platform Build Apply Serialized Model Consumer Apply Build
  40. 40. ML platform Serialized Model Visualization CI/CD Monitoring & Alerts Model Evaluation Change Mgmt Error Handling Data Validation Versioning Security & Access Reporting Testing & QA Logging ApplyBuild
  41. 41. Conclusion • Find the right problem • Define constraints • Design components and interfaces • Take into account organizational constraints • Production can’t be an afterthought • The process is a lot of work, but it’s not rocket science
  42. 42. Questions? Yes, we are hiring… sergei@beeswax.com

×