SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Trenowanie i wdrażanie modeli uczenia
maszynowego z wykorzystaniem GCP
Maciej Pieńkosz
Data Science Summit 2020 1
What we do at Sotrender
2
Our models
1. Sentiment
2. Hatespeech
3. Topic modelling
4. Keyphrase extractor
5. NER (brands and products)
6. Image Tagger
7. Text Extractor
8. Logo Detector
9. Post Classifier
10. ….
3
ML models lifecycle
1. Planning and project setup
2. Data collection and labeling
3. Modeling and exploration
4. Model training and refinement
5. Testing and evaluation
6. Model deployment
7. Ongoing model maintenance and monitoring
4
https://www.jeremyjordan.me/ml-projects-guide/
Modeling with AI Notebooks
1. We use Google Cloud Platform as our cloud provider
2. AI Platform Notebooks is used for initial data exploration and modeling
3. For the start, we favor faster, simpler model architectures that can be easily built,
validated, iterated and eventually deployed (usually on CPU)
4. Experiment tracking: MlFlow
5
https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html
https://cloud.google.com/ai-platform-notebooks?hl=id
Structuring training code
• Notebooks disadvantages:
– You pay for the whole time the notebook is running
– Code quality is usually lower
– Hard to parametrize, unit test, and review
• After initial experimentation phase, we try to give more structure to
the model training code:
– Refactor codebase to Python packages and modules and move
to git repository (Gitlab)
– Add tests (more on it later)
– Wrap code into a Docker container
– Use dedicated AI Platform Training service to train in the cloud
6
https://www.jeremyjordan.me/ml-projects-guide/
AI Platform Training with custom containers
7
• Advantages:
– Develop locally, train in the cloud
– Pay only for the time of training
– Broad configuration options
– Job statuses and logs for historical runs
are available in the dashboard
– Easy integration with hyperparameter
tuning
Training job dockerfile
Cloud training script
Google Storage for Models and Datasets
• We use Google Storage as primary Store for models
and datasets
• One bucket per model
• We follow unified bucket and directory structure,
same for every model
– Raw data
– Combined datasets, with predefined splits
– Model files
• Documentation in Knowledge Base (Confluence)
• One can use dedicated systems like DVC, Quilt
8
Additional training tips
• Consider having two validation sets: training-dev and
test-dev, to distinguish between overfitting errors and
distribution shift
• Establish human performance for your task
• Evaluate your model performance on important data
slices
• Do hyperparameter tuning; utilize open source packages
e.g. hyperopt
• Develop a systematic way of analyzing model errors
Recommended resources:
• https://www.coursera.org/learn/machine-learning-projects
• https://www.deeplearning.ai/machine-learning-yearning/
9
https://towardsdatascience.com/some-strategies-for-machine-learning-projects-5f2f32c34635
Model deployment
• Your options:
– Online
– Batch (offline)
• Our approach is to deploy models as services
– Easy to integrate
– Easy to use by other teams
• We serve them as REST service with Flask (or, most
recently, FastApi)
• We wrap them in Docker containers so they can be
easily deployed to cloud and serve with Cloud Run
10
https://mlinproduction.com/batch-inference-vs-online-inference/
Online inference
Batch inference
Cloud Deployment: Cloud Run
• We use Cloud Run to deploy our model services
• Cloud Build for delegating build process to GCP
• GCP has dedicated service for serving models, AI
Platform Prediction, but we use Cloud Run
– It is more flexible for us, we can set up any
environment and add any dependencies
– AI Predictions has limits regarding model
size
– We can add additional endpoints (e.g.
/explain to services)
11
Service dockerfile
Cloud deployment script
Cloud Run c.d.
• Useful features out-of-the box
– Autoscaling
– Multiple Revisions (versions), easy Rollback
– Traffic management
– Multiple Namespaces (dev, prod)
– Resource Monitoring
12
Delivery pipeline automation (CI/CD)
13
• Implemented in Gitlab CI/CD
push Download files
Build image
Run tests
Run static analysis
Push image to registry
Code Review
Canary
rollout
deploy
Testing and evaluation
• Unit and integration tests for:
– Input pipelines
– Preprocessing functions
• “Regression” tests for:
– Performance on validation data
– Predictions on some important, hand-picked examples
– Performance on data slices
14
Monitoring
• System level metrics:
– Resource consumption (RAM, CPU), healthchecks, status codes, latency, etc.
• Data level metrics
– Prediction distributions, input data distributions
– System performance against real time labels (collected automatically or manually)
15
https://mlinproduction.com/
Streamlit
• https://www.streamlit.io/
• Easy tool to create simple web Data Products directly in Python
• You can use it to create Demos, share your work, showcase your models behaviour, debug
• Very intuitive, no Web skills required
16
https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace
Demos with Streamlit
17
Thanks for attending!
18

Más contenido relacionado

La actualidad más candente

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
SQUADEX
 
Scalable Automatic Machine Learning in H2O
 Scalable Automatic Machine Learning in H2O Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Sri Ambati
 

La actualidad más candente (12)

Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 
Lessons learned from running Pega in Kubernetes
Lessons learned from running Pega in KubernetesLessons learned from running Pega in Kubernetes
Lessons learned from running Pega in Kubernetes
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Kubeflow repos
Kubeflow reposKubeflow repos
Kubeflow repos
 
Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1
 
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin KuoFrom Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the People
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Scalable Automatic Machine Learning in H2O
 Scalable Automatic Machine Learning in H2O Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
 
Gulp and Compass
Gulp and CompassGulp and Compass
Gulp and Compass
 

Similar a Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Cloud Platform

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
Provectus
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

Similar a Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Cloud Platform (20)

Training and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformTraining and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud Platform
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Más de Sotrender

Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Sotrender
 
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Sotrender
 

Más de Sotrender (20)

Topic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w GensimTopic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w Gensim
 
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
 
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
 
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
 
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
 
Predykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie FacebookPredykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie Facebook
 
Wykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskimWykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskim
 
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
 
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
 
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
 
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
 
Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019
 
Pragmatic Machine Learning in Business
Pragmatic Machine Learning in BusinessPragmatic Machine Learning in Business
Pragmatic Machine Learning in Business
 
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
 
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
 
Obsługa klienta w social media
Obsługa klienta w social mediaObsługa klienta w social media
Obsługa klienta w social media
 
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
 
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
 
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie? Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
 
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los VideosMallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
 

Último

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Cloud Platform

  • 1. Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem GCP Maciej Pieńkosz Data Science Summit 2020 1
  • 2. What we do at Sotrender 2
  • 3. Our models 1. Sentiment 2. Hatespeech 3. Topic modelling 4. Keyphrase extractor 5. NER (brands and products) 6. Image Tagger 7. Text Extractor 8. Logo Detector 9. Post Classifier 10. …. 3
  • 4. ML models lifecycle 1. Planning and project setup 2. Data collection and labeling 3. Modeling and exploration 4. Model training and refinement 5. Testing and evaluation 6. Model deployment 7. Ongoing model maintenance and monitoring 4 https://www.jeremyjordan.me/ml-projects-guide/
  • 5. Modeling with AI Notebooks 1. We use Google Cloud Platform as our cloud provider 2. AI Platform Notebooks is used for initial data exploration and modeling 3. For the start, we favor faster, simpler model architectures that can be easily built, validated, iterated and eventually deployed (usually on CPU) 4. Experiment tracking: MlFlow 5 https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html https://cloud.google.com/ai-platform-notebooks?hl=id
  • 6. Structuring training code • Notebooks disadvantages: – You pay for the whole time the notebook is running – Code quality is usually lower – Hard to parametrize, unit test, and review • After initial experimentation phase, we try to give more structure to the model training code: – Refactor codebase to Python packages and modules and move to git repository (Gitlab) – Add tests (more on it later) – Wrap code into a Docker container – Use dedicated AI Platform Training service to train in the cloud 6 https://www.jeremyjordan.me/ml-projects-guide/
  • 7. AI Platform Training with custom containers 7 • Advantages: – Develop locally, train in the cloud – Pay only for the time of training – Broad configuration options – Job statuses and logs for historical runs are available in the dashboard – Easy integration with hyperparameter tuning Training job dockerfile Cloud training script
  • 8. Google Storage for Models and Datasets • We use Google Storage as primary Store for models and datasets • One bucket per model • We follow unified bucket and directory structure, same for every model – Raw data – Combined datasets, with predefined splits – Model files • Documentation in Knowledge Base (Confluence) • One can use dedicated systems like DVC, Quilt 8
  • 9. Additional training tips • Consider having two validation sets: training-dev and test-dev, to distinguish between overfitting errors and distribution shift • Establish human performance for your task • Evaluate your model performance on important data slices • Do hyperparameter tuning; utilize open source packages e.g. hyperopt • Develop a systematic way of analyzing model errors Recommended resources: • https://www.coursera.org/learn/machine-learning-projects • https://www.deeplearning.ai/machine-learning-yearning/ 9 https://towardsdatascience.com/some-strategies-for-machine-learning-projects-5f2f32c34635
  • 10. Model deployment • Your options: – Online – Batch (offline) • Our approach is to deploy models as services – Easy to integrate – Easy to use by other teams • We serve them as REST service with Flask (or, most recently, FastApi) • We wrap them in Docker containers so they can be easily deployed to cloud and serve with Cloud Run 10 https://mlinproduction.com/batch-inference-vs-online-inference/ Online inference Batch inference
  • 11. Cloud Deployment: Cloud Run • We use Cloud Run to deploy our model services • Cloud Build for delegating build process to GCP • GCP has dedicated service for serving models, AI Platform Prediction, but we use Cloud Run – It is more flexible for us, we can set up any environment and add any dependencies – AI Predictions has limits regarding model size – We can add additional endpoints (e.g. /explain to services) 11 Service dockerfile Cloud deployment script
  • 12. Cloud Run c.d. • Useful features out-of-the box – Autoscaling – Multiple Revisions (versions), easy Rollback – Traffic management – Multiple Namespaces (dev, prod) – Resource Monitoring 12
  • 13. Delivery pipeline automation (CI/CD) 13 • Implemented in Gitlab CI/CD push Download files Build image Run tests Run static analysis Push image to registry Code Review Canary rollout deploy
  • 14. Testing and evaluation • Unit and integration tests for: – Input pipelines – Preprocessing functions • “Regression” tests for: – Performance on validation data – Predictions on some important, hand-picked examples – Performance on data slices 14
  • 15. Monitoring • System level metrics: – Resource consumption (RAM, CPU), healthchecks, status codes, latency, etc. • Data level metrics – Prediction distributions, input data distributions – System performance against real time labels (collected automatically or manually) 15 https://mlinproduction.com/
  • 16. Streamlit • https://www.streamlit.io/ • Easy tool to create simple web Data Products directly in Python • You can use it to create Demos, share your work, showcase your models behaviour, debug • Very intuitive, no Web skills required 16 https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace