Publicidad
Publicidad

Más contenido relacionado

Similar a The Three Body Problem of Data Science(20)

Publicidad
Publicidad

The Three Body Problem of Data Science

  1. TheThree Body Problem of DataScience DimaKaramshuk @Skyscanner@SkyscannerEng
  2. Three Body Problem
  3. Three Body Problem • One is unable to predict subsequent motions of three bodies acting on each other, from their initial velocities and positions
  4. Three Body Problem Science Engineering Product Buddies
  5. Science Engineering Product • What we invest into Actions and Reactions
  6. Science Engineering Product • What we invest into • How we structure and integrate the team Actions and Reactions
  7. Science Engineering Product • What we invest into • How we structure and integrate the team • What deliverables we expect Actions and Reactions
  8. Types of Problems Existing product, existing solution Existing product, new solution New product, existing solution New product, new solution Science Engineering Product
  9. Optimizing Existing Product Example: Case N1. Replacing heuristics in existing product with ML
  10. Optimizing Existing Product Good thing: • well defined and controlled environment Bad thing: • integration with existing infrastructure Case N1. Replacing heuristics in existing product with ML
  11. Offline Evaluation Formulating Machine Learning Problem Product Lifecycle
  12. Product Lifecycle Data Insights Offline Evaluation Production Prototype Online Evaluation Production Formulating Machine Learning Problem ModelVersioning Data Acquisition only 20% of work
  13. Product Lifecycle Data Insights Offline Evaluation Production Prototype Online Evaluation Production Formulating Machine Learning Problem ModelVersioning Data Acquisition
  14. Logging what users click on first page results
  15. Logging what users click on what users see but do not click on what users do not see first page results
  16. Product Lifecycle Data Insights Offline Evaluation Production Prototype Online Evaluation Production Formulating Machine Learning Problem ModelVersioning Data Acquisition
  17. Prototype – model trained in a Jupyter Notebook is a good starting point, but…
  18. Production Pipeline Data Querying AWS Athena Data Archive AWS S3 Data Collection Current Model Model Training scikit-learn Training Data 7 recent days Validation Data 5% of the last day Model Validation Passed? Skyscanner Traffic Pre-processing Experiments with Challenger Model 5% 5% 90% Training Component (AWS CF + AWS Data Pipeline) Report Failure Update ModelApache Kafka Serving Component ECMLPKDD’2018, https://arxiv.org/pdf/1812.01735.pdf
  19. Optimizing Existing Product Good thing: • well defined and controlled environment Bad thing: • integration with existing infrastructure Science Engineering Product Case N1. Replacing heuristics in existing product with ML
  20. Role != Person
  21. Iterating over Existing Algorithm Case N2. Iterating over existing ML algorithm in existing product
  22. Iterating over Existing Algorithm Good thing: • return on infrastructure investments Bad thing: • possibly limited impact Science Engineering Product Case N2. Iterating over existing ML algorithm in existing product
  23. Iterating over Existing Algorithm Science Engineering Product Case N2. Iterating over existing ML algorithm in existing product • How to justify further investment? • Should it become a BAU? • Who should own it?
  24. Building a New Product Case N3. Building a first version of a new data product
  25. Building a New Product Good thing: • less dependencies Bad thing: • high level of uncertainty Science Engineering Product Case N3. Building a first version of a new data product
  26. Managing Uncertainty Levels of uncertainty: – Is the position right? – Is the user flow right? – Is the message right? – Is the design right? – Is the algorithm right? – What is the baseline? – etc. etc.
  27. Levels of uncertainty: – Is the position right? – Is the user flow right? – Is the message right? – Is the design right? – Is the algorithm right? – What is the baseline? – etc. etc. Managing Uncertainty +30% engagement
  28. Managing Uncertainty Skyscanner Backpack Levels of uncertainty: – Is the position right? – Is the user flow right? – Is the message right? – Is the design right? – Is the algorithm right? – What is the baseline? – etc. etc.
  29. Managing Uncertainty Levels of uncertainty: – Is the position right? – Is the user flow right? – Is the message right? – Is the design right? – Is the algorithm right? – What is the baseline? – etc. etc. Start really simple
  30. Managing Uncertainty Levels of uncertainty: – Is the position right? – Is the user flow right? – Is the message right? – Is the design right? – Is the algorithm right? – What is the baseline? – etc. etc. Study previous experience
  31. Case N4: NewScience in New Products Existing product, existing solution Existing product, new solution New product, existing solution New product, new solution
  32. Case N4: NewScience in New Products Existing product, existing solution Existing product, new solution New product, existing solution New product, new solution
  33. Join our team! @karamshuk @SkyscannerEng
Publicidad