Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Role of ML engineer

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Golovko_Resume
Golovko_Resume
Cargando en…3
×

Eche un vistazo a continuación

1 de 21 Anuncio

Role of ML engineer

Descargar para leer sin conexión

Talk by Borys Biletskyy at Data Science Amsterdam and Data Science Utrecht. The talk is dedicated to the role of Machine Learning Engineer and how it can improve the success rate of Data Science projects.

Talk by Borys Biletskyy at Data Science Amsterdam and Data Science Utrecht. The talk is dedicated to the role of Machine Learning Engineer and how it can improve the success rate of Data Science projects.

Anuncio
Anuncio

Más Contenido Relacionado

Similares a Role of ML engineer (20)

Más reciente (20)

Anuncio

Role of ML engineer

  1. 1. Role of Machine Learning Engineer Borys Biletskyy Data Science Amsterdam 28-05-2019
  2. 2. Agenda 1. About Myself 2. Motivation 3. Data Science Process 4. Roles in Data Analytics 5. 3 Challenges for ML Engineer
  3. 3. About Myself ● Software Engineer since 2004 ○ Low level, C++ -> Enterprise, Java -> Data Driven, Scala ○ Dev, Tech Lead, Architect, Consultant ● Researcher since 2004 ○ PhD in Theoretical Computer Science ○ Complexity and Scalability of ML Methods ● Machine Learning Engineer since 2017 ○ Python, Scala ○ LeasePlan, Randstad, VodafoneZiggo
  4. 4. Motivation ● Low success rate of Data Analytics projects ○ Gartner: 60% of Data Analytics projects fail* ● General C-level recommendations ○ The Data Economy: Why do so many analytics projects fail?** ○ 8 Reasons why Data Analytics projects fail*** ○ ... ● Often the problem is in a team structure ○ How Machine Learning Engineer role can help * - https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/ ** - https://www.dataversity.net/many-data-analytics-projects-fail-save/ *** - https://www.eastbanctech.com/technology-insights/what-the-tech/why-so-many-analytics-projects-fail.html
  5. 5. Data Science Process* Define Goal Data Collection Deploy Model Serve Model (Request|Batch|Stream) Modeling Validation Monitor *https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be Feature Engineering Exploratory Data Analysis Data Pre-Processing
  6. 6. Data Science Process* Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE poor data quality can’t scale this method horizontally model is too slow for streaming *https://www.youtube.com/watch?v=XoBJwxuPynk&feature=youtu.be DS DE DE-DS handover is slow
  7. 7. Adv. Analytics Math/Stats ML/AI Scripting Programming Distributed Sys. Data Pipelines Data Scientist & Data Engineer ● Fast insights driven ● Small applications ● Highly dynamic development ● Interactive notebook scripts ● Running on laptop ● Academic background ● Interacts with business/domain experts ● Agile ● Production systems ● QA and processes ● Modular, reusable, maintainable, scalable ● Running on cluster ● Engineering Background ● Interacts with platform engineers
  8. 8. Data Analytics Skills Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines * https://www.oreilly.com/ideas/data-engineers-vs-data-scientists 1DS ~ 5DE DS DE DE DE DE DE
  9. 9. DataOps Teams* 1DS ~ 3DE Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines DS DE DE DE
  10. 10. DataOps Team ● DataOps Team ○ cross-functional ○ owns whole feature life cycle ○ dynamic ○ T-shaped ● Guilds & Feature Teams ● Data Platform AAS ○ Platform Engineers Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  11. 11. Machine Learning Engineer Role (Fill The Gap) Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  12. 12. Machine Learning Engineer Role (Coordinating) Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines
  13. 13. ● Coordination ● Improve communication ● Guards pragmatic development standards ● Sets (Agile) processes ● Makes DE<->DS handover smooth ● Balances the number of DE’s and DS’s ● Can work in both disciplines ● ML Engineer specific skills: ○ Custom ML algorithms ○ Custom ML solutions ○ ML model logistics ○ ML pipelines ML Engineer DE DEDS DEMLDS DS Adv. Analytics Math/Stats ML/AI Scripting Data Science Data Engineering Programming Distributed Sys. Data Pipelines ML Engineering
  14. 14. 3 Challenges for ML Engineer
  15. 15. Challenge 1: Data Platform Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE Poor data quality
  16. 16. Challenge 1: Data Platform ● Before: ○ Data samples insights ○ Different teams: DS, DE, PE ○ Unsynchronized sprints ○ Loss of Focus ○ Too long time to market ○ Different levels of problem solving ■ Connectivity (PE) ■ Data Ingestion (DE) ■ EDA & Feature Engineering (D) ● After: ○ Feature teams: DE, DS, ME (PE) ○ Continuous Data Platform Improvements ○ Unified: ■ Data storage ■ Data Ingestion ■ Data Pre-processing ○ Early data injection from new sources ○ All data is available for experimenting ○ Less rework and handover iterations ○ Faster time to market
  17. 17. Challenge 2: Scalability of ML Methods (Tools) Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE This method is not scalable
  18. 18. Challenge 2: Scalability of ML Methods (Tools) ● Before: ○ Horizontally scalable Data Platform AAS ○ Different teams ■ Different tools and standards ■ Unsynchronized sprints ○ No DE-DS coordination before deployment ■ Rework iterations ○ Lack of understanding of scalability ■ horizontal / vertical ○ Lack of understanding of ML stages ■ training / scoring ○ Unscalable tools: scikit-learn, R ○ Unscalable methods: Neural Nets ● After: ○ Feature teams: DE, DS, ME (PE) ○ Shared codebase ○ Standardised tooling ○ Reusable building blocks for ML Pipelines: ■ Notebooks (easy to use) ■ Cluster (production ready) ○ Testing strategy ○ Automated Deployment ○ DS modifying and deploying ML pipelines ○ Faster time to market
  19. 19. Challenge 3: Model Serving Define Goal Data Collection DS Feature Engineering DS Exploratory Data Analysis DSDE Data Pre-Processing DE Deploy Model DE Serve Model (Request|Batch|Stream) DE Modeling DS Validation DSDE Monitor DS DE This model is too slow for real-time scoring
  20. 20. Challenge 3: Model Serving ● Before: ○ Single team: DS, DE ○ Lack of DS-DE coordination ○ Poorly scalable design ■ In-memory (big) data processing ○ Poorly scalable methods ■ Cos-nearest neighbors search ■ O(n) instead of const ○ Rework ○ Problems with real time scoring ● After: ○ Single team: DE, DS, ME ○ Models serving is planned early ○ Efficient refinements ○ Serving strategy drives solution design ○ Less rework ○ Faster time to market
  21. 21. Q&A

×