SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale Machine Learning from zero to
millions of users
Julien Simon
Global Technical Evangelist, AI & Machine Learning, AWS
@julsimon
Rationale
How to train ML models and deploy them in production, from
humble beginnings to world domination.
Try to take reasonable and justified steps
Use examples based on AWS services (because that’s what I
know best), but this should generalize well
Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine-
learning-from-0-to-millions-of-users-part-1-a2d36a5e849
High-level services: call an API, get the job done
Pre-trained AI services
that require no ML skills
Easily add intelligence to
your existing apps and
workflows
Quality and accuracy from
continuously-learning APIs
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
Language Forecasting Recommendations
High-level != generic
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
Language Forecasting Recommendations
Polly: custom lexicons, SSML
Transcribe: custom vocabulary
Translate: custom terminology
Comprehend: custom text classification & entity extraction
Forecast & Personalize: train on your own data!
And so it begins
• You’ve trained a model on a local machine, using a popular open source library.
• You’ve measured the model’s accuracy, and things look good. Now you’d like to
deploy it to check its actual behaviour, to run A/B tests, etc.
• You’ve embedded the model in your business application.
• You’ve deployed everything to a single Ubuntu virtual machine in the cloud.
• Everything works, you’re serving predictions, life is good!
Score card
Single EC2 instance
Infrastructure effort C’mon, it’s just one instance
ML setup effort pip install tensorflow
CI/CD integration Not needed
Build models DIY
Train models python train.py
Deploy models (at scale) python predict.py
Scale/HA inference Not needed
Optimize costs Not needed
Security Not needed
A few instances and models later…
• Life is not that good
• Too much manual work
• Time-consuming and error-prone
• Dependency hell
• No cost optimization
• Monolithic architecture
• Deployment hell
• Multiple apps can’t share the same model
• Apps and models scale differently
Use AWS-maintained tools
• Deep Learning AMI
• Deep Learning containers
Dockerize
Create a prediction service
• Model servers
• Bespoke API (Flask?)
AWS Deep Learning AMIs
Optimized environments on Amazon Linux or Ubuntu
Conda AMI
For developers who want pre-
installed pip packages of DL
frameworks in separate virtual
environments.
Base AMI
For developers who want a clean
slate to set up private DL engine
repositories or custom builds of DL
engines.
Running a new EC2 instance with the Deep Learning AMI
aws ec2 run-instances 
--image-id ami-02273e0d16172dbd1  # Deep Learning AMI in eu-west-1
--instance-type p3.2xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=dlami-demo}]' 
--key-name $KEYPAIR 
--security-group-ids $SECURITY_GROUP 
--iam-instance-profile Name=$ROLE
Connecting to Jupyter
On your local machine
ssh -L 8000:localhost:8888 ec2-user@INSTANCE_NAME
On the EC2 instance
jupyter notebook --no-browser --port=8888
On your local machine
Open http://localhost:8000
Training with the Tensorflow Deep Learning container
On the training machine
$(aws ecr get-login --no-include-email --region eu-west-1 --registry-ids 763104351884)
docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-
horovod-gpu-py27-cu100-ubuntu16.04
nvidia-docker run -it 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-
training:1.13-horovod-gpu-py27-cu100-ubuntu16.04
In the container
git clone https://github.com/fchollet/keras.git
python keras/examples/mnist_cnn.py
List of image names: https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html
Scaling alert!
• More customers, more team members, more models, woohoo!
• Scalability, high availability & security are now a thing
• Scaling up is a losing proposition. You need to scale out
• Only automation can save you: IaC, CI/CD and all that good DevOps stuff
• What are your options?
Option 1: virtual machines
• Definitely possible, but:
• Why? Seriously, I want to know.
• Operational and financial issues await if you don’t automate extensively
• Training
• Build on-demand clusters with CloudFormation, Terraform, etc.
• Distributed training is a pain to set up
• Prediction
• Automate deployement with CI/CD
• Scale with Auto Scaling, Load Balancers, etc.
• Spot, spot, spot
Score card
More EC2 instances
Infrastructure effort Lots
ML setup effort Some (DL AMI)
CI/CD integration No change
Build models DIY
Train models DIY
Deploy models DIY (model servers)
Scale/HA inference DIY (Auto Scaling, LB)
Optimize costs DIY (Spot, automation)
Security DIY (IAM, VPC, KMS)
Option 2: Docker clusters
• This makes a lot of sense if you’re already deploying apps to Docker
• No change to the dev experience: same workflows, same CI/CD, etc.
• Deploy prediction services on the same infrastructure as business apps.
• Amazon ECS and Amazon EKS
• Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc.
• Both come with AWS-maintained AMIs that will save you time
• One cluster or many clusters ?
• Build on-demand development and test clusters with CloudFormation, Terraform, etc.
• Many customers find that running a large single production cluster works better
• Still instance-based and not fully-managed
• Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do
• And yes, this matters even if « someone else is taking care of clusters »
Creating an ECS cluster and adding instances
aws ecs create-cluster --cluster-name ecs-demo
# Add 4 p2.xlarge spot instances, ECS-optimized AMI with GPU support, default VPC
aws ec2 run-instances --image-id ami-0638eba79fcfe776e 
--count 4 
--instance-type p2.xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ecs-demo}]’
--key-name $KEYPAIR 
--security-group-ids $SECURITY_GROUP 
--iam-instance-profile Name=$ROLE
--user-data file://user-data.txt
# Add 2 c5.2xlarge, ECS-optimized AMI, default VPC, different subnet
aws ec2 run-instances --image-id ami-09cd8db92c6bf3a84 
--count 2 
--instance-type c5.2xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--subnet $SUBNET_ID 
. . .
Defining the training task
"containerDefinitions": [{
"command": [
“git clone https://github.com/fchollet/keras.git && python keras/examples/mnist_cnn.py”],
"entryPoint": [ "sh","-c"],
"name": "TFconsole",
"image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-horovod-gpu-py36-
cu100-ubuntu16.04",
"memory": 4096,
"cpu": 256,
"resourceRequirements" : [ {"type" : "GPU”, "value" : "1”} ],
. . .
Defining the inference task
"containerDefinitions": [{
"command": [
“git clone -b r1.13 https://github.com/tensorflow/serving.git && tensorflow_model_server
--port=8500 --rest_api_port=8501 --model_name=<MODEL_NAME> --model_base_path=<MODEL_PATH>”],
"entryPoint": [ "sh","-c"],
"name": "TFinference",
"image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.13-cpu-py36-
ubuntu16.04"",
"memory": 4096,
"cpu": 256,
"portMappings": [{ "hostPort": 8500, "protocol": "tcp”, "containerPort": 8500},
{ "hostPort": 8501, "protocol": "tcp”, "containerPort": 8501},
. . .
Running training and inference on the cluster
# Create task definitions for training and inference
aws ecs register-task-definition --cli-input-json file://training.json
aws ecs register-task-definition --cli-input-json file://inference.json
# Run 4 training tasks (the GPU requirement is in the task definition)
aws ecs run-task --cluster ecs-demo --task-definition training:1 --count 4
# Create inference service, starting with 1 initial task
# Run it on c5 instance, and spread tasks evenly
aws ecs create-service --cluster ecs-demo 
--service-name inference-cpu 
--task-definition inference:1 
--desired-count 1 
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ c5.*" 
--placement-strategy field="instanceId",type="spread"
# Scale inference service to 2 tasks
aws ecs update-service --cluster ecs-demo --service inference-cpu --desired-count 2
Score card
EC2 ECS / EKS
Infrastructure effort Lots Some (Docker tools)
ML setup effort Some (DL AMI) Some (DL containers)
CI/CD integration No change No change
Build models DIY DIY
Train models (at scale) DIY DIY (Docker tools)
Deploy models (at scale) DIY (model servers) DIY (Docker tools)
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.)
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation)
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS)
Option 3: go fully managed with Amazon SageMaker
1
2
3
Model options on Amazon SageMaker
Training code
Factorization Machines
Linear Learner
Principal Component Analysis
K-Means Clustering
XGBoost
And more
Built-in Algorithms (17)
No ML coding required
No infrastructure work required
Distributed training
Pipe mode
Bring Your Own Container
Full control, run anything!
R, C++, etc.
No infrastructure work required
Built-in Frameworks
Bring your own code: script mode
Open source containers
No infrastructure work required
Distributed training
Pipe mode
The Amazon SageMaker API
• Python SDK orchestrating all Amazon SageMaker activity
• High-level objects for algorithm selection, training, deploying,
automatic model tuning, etc.
• Spark SDK (Python & Scala)
• AWS SDK
• For scripting and automation
• CLI : ‘aws sagemaker’
• Language SDKs: boto3, etc.
Training and deploying
tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
role=role,
train_instance_count=1,
train_instance_type='ml.c5.2xlarge’,
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 10,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by a single instance
tf_endpoint = tf_estimator.deploy(initial_instance_count=1,instance_type=ml.t3.xlarge)
tf_endpoint.predict(…)
Training and deploying, at any scale
tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py',
role=role,
train_instance_count=8,
train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 200,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by 16 multi-AZ load-balanced instances
tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge)
tf_endpoint.predict(…)
https://gitlab.com/juliensimon/dlnotebooks/blob/master/sagemaker/08-Image-classification-advanced.ipynb
Score card
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DL AMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training,
Auto Scaling for inference
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
Score card
Flamewarin3,2,1…
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DL AMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training,
Auto Scaling for inference
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
Personal opinion Small scale only, unless you have
strong DevOps skills and enjoy
exercising them.
Reasonable choice if you’re a
Docker shop and know how to use
the rich Docker ecosystem. If not,
I’d think twice: Docker isn’t an ML
platform.
Learn it in a few hours, forget
about servers, focus 100% on
ML, enjoy goodies like pipe
mode, distributed training,
HPO, inference pipelines and
more.
Conclusion
• Whatever works for you at this time is fine
• Don’t over-engineer, and don’t « plan for the future »
• Fight « we’ve always done like this », NIH, and Hype Driven Development
• Optimize for current business conditions, pay attention to TCO
• Models and data matter, not infrastructure
• When conditions change, move fast: smash and rebuild
• ... which is what cloud is all about!
• « 100% of our time spent on ML » shall be the whole of the Law
• Mix and match
• Train on SageMaker, deploy on ECS/EKS… or vice versa
• Write your own story!
Getting started
http://aws.amazon.com/free
https://aws.ai
https://aws.amazon.com/machine-learning/amis/
https://aws.amazon.com/machine-learning/containers/
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/awslabs/amazon-sagemaker-examples
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlcontainers DL AMI / container demos
https://gitlab.com/juliensimon/dlnotebooks SageMaker notebooks
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Technical Evangelist, AI & Machine Learning, AWS
@julsimon

Más contenido relacionado

La actualidad más candente

Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...Amazon Web Services Korea
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Amazon Web Services
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayAmazon Web Services Korea
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep DiveRightScale
 
Introducing Amazon Simple Workflow (Amazon SWF)
Introducing Amazon Simple Workflow (Amazon SWF)Introducing Amazon Simple Workflow (Amazon SWF)
Introducing Amazon Simple Workflow (Amazon SWF)Amazon Web Services
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Julien SIMON
 
Amazon EC2 Container Service Live Demo - Microservices Web Day
Amazon EC2 Container Service Live Demo - Microservices Web DayAmazon EC2 Container Service Live Demo - Microservices Web Day
Amazon EC2 Container Service Live Demo - Microservices Web DayAWS Germany
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...NETWAYS
 
Building Open Source Platforms on AWS (April 2017)
Building Open Source Platforms on AWS (April 2017)Building Open Source Platforms on AWS (April 2017)
Building Open Source Platforms on AWS (April 2017)Julien SIMON
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Oracle OpenWorld 2010大会发布的新公告及关键信息
Oracle OpenWorld 2010大会发布的新公告及关键信息Oracle OpenWorld 2010大会发布的新公告及关键信息
Oracle OpenWorld 2010大会发布的新公告及关键信息slidethanks
 
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...Amazon Web Services
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Pitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overviewPitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overviewEagleDream Technologies
 
[AWS Tech Talk] Using containers for deep learning workflows
[AWS Tech Talk] Using containers for deep learning workflows[AWS Tech Talk] Using containers for deep learning workflows
[AWS Tech Talk] Using containers for deep learning workflowsshashank4
 
AWS Step Functions 実践
AWS Step Functions 実践AWS Step Functions 実践
AWS Step Functions 実践Shuji Kikuchi
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)Amazon Web Services
 

La actualidad más candente (20)

Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
 
Introducing Amazon Simple Workflow (Amazon SWF)
Introducing Amazon Simple Workflow (Amazon SWF)Introducing Amazon Simple Workflow (Amazon SWF)
Introducing Amazon Simple Workflow (Amazon SWF)
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)
 
Amazon EC2 Container Service Live Demo - Microservices Web Day
Amazon EC2 Container Service Live Demo - Microservices Web DayAmazon EC2 Container Service Live Demo - Microservices Web Day
Amazon EC2 Container Service Live Demo - Microservices Web Day
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
 
Building Open Source Platforms on AWS (April 2017)
Building Open Source Platforms on AWS (April 2017)Building Open Source Platforms on AWS (April 2017)
Building Open Source Platforms on AWS (April 2017)
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Oracle OpenWorld 2010大会发布的新公告及关键信息
Oracle OpenWorld 2010大会发布的新公告及关键信息Oracle OpenWorld 2010大会发布的新公告及关键信息
Oracle OpenWorld 2010大会发布的新公告及关键信息
 
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
AWS Summit 2013 | Auckland - Continuous Deployment Practices, with Production...
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Pitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overviewPitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overview
 
[AWS Tech Talk] Using containers for deep learning workflows
[AWS Tech Talk] Using containers for deep learning workflows[AWS Tech Talk] Using containers for deep learning workflows
[AWS Tech Talk] Using containers for deep learning workflows
 
AWS Step Functions 実践
AWS Step Functions 実践AWS Step Functions 実践
AWS Step Functions 実践
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 

Similar a "Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
AWS re:Invent 2017 Recap - Solutions Updates
AWS re:Invent 2017 Recap - Solutions UpdatesAWS re:Invent 2017 Recap - Solutions Updates
AWS re:Invent 2017 Recap - Solutions UpdatesAmazon Web Services
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSAmazon Web Services
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Machine Learning inference at the Edge
Machine Learning inference at the EdgeMachine Learning inference at the Edge
Machine Learning inference at the EdgeJulien SIMON
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...Amazon Web Services
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMicrosoft Tech Community
 
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...Provectus
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in AzureBruno Capuano
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneSunil Mallya
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allMarc Dutoo
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
Cloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesCloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesAmazon Web Services
 
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivSelf Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivAmazon Web Services
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesAmazon Web Services
 

Similar a "Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019 (20)

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
AWS re:Invent 2017 Recap - Solutions Updates
AWS re:Invent 2017 Recap - Solutions UpdatesAWS re:Invent 2017 Recap - Solutions Updates
AWS re:Invent 2017 Recap - Solutions Updates
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECS
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Machine Learning inference at the Edge
Machine Learning inference at the EdgeMachine Learning inference at the Edge
Machine Learning inference at the Edge
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azure
 
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure2018 11 14 Artificial Intelligence and Machine Learning in Azure
2018 11 14 Artificial Intelligence and Machine Learning in Azure
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 Lausanne
 
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them allEclipseCon 2016 - OCCIware : one Cloud API to rule them all
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
 
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
Cloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesCloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web Services
 
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivSelf Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
 

Más de Provectus

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP SolutionProvectus
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Provectus
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...Provectus
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...Provectus
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...Provectus
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...Provectus
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...Provectus
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...Provectus
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMProvectus
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupProvectus
 

Más de Provectus (20)

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
 

Último

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 

Último (20)

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale Machine Learning from zero to millions of users Julien Simon Global Technical Evangelist, AI & Machine Learning, AWS @julsimon
  • 2. Rationale How to train ML models and deploy them in production, from humble beginnings to world domination. Try to take reasonable and justified steps Use examples based on AWS services (because that’s what I know best), but this should generalize well Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine- learning-from-0-to-millions-of-users-part-1-a2d36a5e849
  • 3.
  • 4. High-level services: call an API, get the job done Pre-trained AI services that require no ML skills Easily add intelligence to your existing apps and workflows Quality and accuracy from continuously-learning APIs R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots F O R E C A S TT E X T R A C T P E R S O N A L I Z E Language Forecasting Recommendations
  • 5. High-level != generic R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots F O R E C A S TT E X T R A C T P E R S O N A L I Z E Language Forecasting Recommendations Polly: custom lexicons, SSML Transcribe: custom vocabulary Translate: custom terminology Comprehend: custom text classification & entity extraction Forecast & Personalize: train on your own data!
  • 6.
  • 7. And so it begins • You’ve trained a model on a local machine, using a popular open source library. • You’ve measured the model’s accuracy, and things look good. Now you’d like to deploy it to check its actual behaviour, to run A/B tests, etc. • You’ve embedded the model in your business application. • You’ve deployed everything to a single Ubuntu virtual machine in the cloud. • Everything works, you’re serving predictions, life is good!
  • 8. Score card Single EC2 instance Infrastructure effort C’mon, it’s just one instance ML setup effort pip install tensorflow CI/CD integration Not needed Build models DIY Train models python train.py Deploy models (at scale) python predict.py Scale/HA inference Not needed Optimize costs Not needed Security Not needed
  • 9.
  • 10. A few instances and models later… • Life is not that good • Too much manual work • Time-consuming and error-prone • Dependency hell • No cost optimization • Monolithic architecture • Deployment hell • Multiple apps can’t share the same model • Apps and models scale differently Use AWS-maintained tools • Deep Learning AMI • Deep Learning containers Dockerize Create a prediction service • Model servers • Bespoke API (Flask?)
  • 11. AWS Deep Learning AMIs Optimized environments on Amazon Linux or Ubuntu Conda AMI For developers who want pre- installed pip packages of DL frameworks in separate virtual environments. Base AMI For developers who want a clean slate to set up private DL engine repositories or custom builds of DL engines.
  • 12.
  • 13. Running a new EC2 instance with the Deep Learning AMI aws ec2 run-instances --image-id ami-02273e0d16172dbd1 # Deep Learning AMI in eu-west-1 --instance-type p3.2xlarge --instance-market-options '{"MarketType":"spot"}' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=dlami-demo}]' --key-name $KEYPAIR --security-group-ids $SECURITY_GROUP --iam-instance-profile Name=$ROLE
  • 14. Connecting to Jupyter On your local machine ssh -L 8000:localhost:8888 ec2-user@INSTANCE_NAME On the EC2 instance jupyter notebook --no-browser --port=8888 On your local machine Open http://localhost:8000
  • 15. Training with the Tensorflow Deep Learning container On the training machine $(aws ecr get-login --no-include-email --region eu-west-1 --registry-ids 763104351884) docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13- horovod-gpu-py27-cu100-ubuntu16.04 nvidia-docker run -it 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow- training:1.13-horovod-gpu-py27-cu100-ubuntu16.04 In the container git clone https://github.com/fchollet/keras.git python keras/examples/mnist_cnn.py List of image names: https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html
  • 16.
  • 17. Scaling alert! • More customers, more team members, more models, woohoo! • Scalability, high availability & security are now a thing • Scaling up is a losing proposition. You need to scale out • Only automation can save you: IaC, CI/CD and all that good DevOps stuff • What are your options?
  • 18. Option 1: virtual machines • Definitely possible, but: • Why? Seriously, I want to know. • Operational and financial issues await if you don’t automate extensively • Training • Build on-demand clusters with CloudFormation, Terraform, etc. • Distributed training is a pain to set up • Prediction • Automate deployement with CI/CD • Scale with Auto Scaling, Load Balancers, etc. • Spot, spot, spot
  • 19. Score card More EC2 instances Infrastructure effort Lots ML setup effort Some (DL AMI) CI/CD integration No change Build models DIY Train models DIY Deploy models DIY (model servers) Scale/HA inference DIY (Auto Scaling, LB) Optimize costs DIY (Spot, automation) Security DIY (IAM, VPC, KMS)
  • 20. Option 2: Docker clusters • This makes a lot of sense if you’re already deploying apps to Docker • No change to the dev experience: same workflows, same CI/CD, etc. • Deploy prediction services on the same infrastructure as business apps. • Amazon ECS and Amazon EKS • Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc. • Both come with AWS-maintained AMIs that will save you time • One cluster or many clusters ? • Build on-demand development and test clusters with CloudFormation, Terraform, etc. • Many customers find that running a large single production cluster works better • Still instance-based and not fully-managed • Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do • And yes, this matters even if « someone else is taking care of clusters »
  • 21.
  • 22. Creating an ECS cluster and adding instances aws ecs create-cluster --cluster-name ecs-demo # Add 4 p2.xlarge spot instances, ECS-optimized AMI with GPU support, default VPC aws ec2 run-instances --image-id ami-0638eba79fcfe776e --count 4 --instance-type p2.xlarge --instance-market-options '{"MarketType":"spot"}' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ecs-demo}]’ --key-name $KEYPAIR --security-group-ids $SECURITY_GROUP --iam-instance-profile Name=$ROLE --user-data file://user-data.txt # Add 2 c5.2xlarge, ECS-optimized AMI, default VPC, different subnet aws ec2 run-instances --image-id ami-09cd8db92c6bf3a84 --count 2 --instance-type c5.2xlarge --instance-market-options '{"MarketType":"spot"}' --subnet $SUBNET_ID . . .
  • 23. Defining the training task "containerDefinitions": [{ "command": [ “git clone https://github.com/fchollet/keras.git && python keras/examples/mnist_cnn.py”], "entryPoint": [ "sh","-c"], "name": "TFconsole", "image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-horovod-gpu-py36- cu100-ubuntu16.04", "memory": 4096, "cpu": 256, "resourceRequirements" : [ {"type" : "GPU”, "value" : "1”} ], . . .
  • 24. Defining the inference task "containerDefinitions": [{ "command": [ “git clone -b r1.13 https://github.com/tensorflow/serving.git && tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=<MODEL_NAME> --model_base_path=<MODEL_PATH>”], "entryPoint": [ "sh","-c"], "name": "TFinference", "image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.13-cpu-py36- ubuntu16.04"", "memory": 4096, "cpu": 256, "portMappings": [{ "hostPort": 8500, "protocol": "tcp”, "containerPort": 8500}, { "hostPort": 8501, "protocol": "tcp”, "containerPort": 8501}, . . .
  • 25. Running training and inference on the cluster # Create task definitions for training and inference aws ecs register-task-definition --cli-input-json file://training.json aws ecs register-task-definition --cli-input-json file://inference.json # Run 4 training tasks (the GPU requirement is in the task definition) aws ecs run-task --cluster ecs-demo --task-definition training:1 --count 4 # Create inference service, starting with 1 initial task # Run it on c5 instance, and spread tasks evenly aws ecs create-service --cluster ecs-demo --service-name inference-cpu --task-definition inference:1 --desired-count 1 --placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ c5.*" --placement-strategy field="instanceId",type="spread" # Scale inference service to 2 tasks aws ecs update-service --cluster ecs-demo --service inference-cpu --desired-count 2
  • 26. Score card EC2 ECS / EKS Infrastructure effort Lots Some (Docker tools) ML setup effort Some (DL AMI) Some (DL containers) CI/CD integration No change No change Build models DIY DIY Train models (at scale) DIY DIY (Docker tools) Deploy models (at scale) DIY (model servers) DIY (Docker tools) Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS)
  • 27. Option 3: go fully managed with Amazon SageMaker 1 2 3
  • 28. Model options on Amazon SageMaker Training code Factorization Machines Linear Learner Principal Component Analysis K-Means Clustering XGBoost And more Built-in Algorithms (17) No ML coding required No infrastructure work required Distributed training Pipe mode Bring Your Own Container Full control, run anything! R, C++, etc. No infrastructure work required Built-in Frameworks Bring your own code: script mode Open source containers No infrastructure work required Distributed training Pipe mode
  • 29. The Amazon SageMaker API • Python SDK orchestrating all Amazon SageMaker activity • High-level objects for algorithm selection, training, deploying, automatic model tuning, etc. • Spark SDK (Python & Scala) • AWS SDK • For scripting and automation • CLI : ‘aws sagemaker’ • Language SDKs: boto3, etc.
  • 30. Training and deploying tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py', role=role, train_instance_count=1, train_instance_type='ml.c5.2xlarge’, framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 10, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by a single instance tf_endpoint = tf_estimator.deploy(initial_instance_count=1,instance_type=ml.t3.xlarge) tf_endpoint.predict(…)
  • 31. Training and deploying, at any scale tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py', role=role, train_instance_count=8, train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 200, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by 16 multi-AZ load-balanced instances tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge) tf_endpoint.predict(…)
  • 33. Score card EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DL AMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training, Auto Scaling for inference Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
  • 34. Score card Flamewarin3,2,1… EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DL AMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training, Auto Scaling for inference Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters Personal opinion Small scale only, unless you have strong DevOps skills and enjoy exercising them. Reasonable choice if you’re a Docker shop and know how to use the rich Docker ecosystem. If not, I’d think twice: Docker isn’t an ML platform. Learn it in a few hours, forget about servers, focus 100% on ML, enjoy goodies like pipe mode, distributed training, HPO, inference pipelines and more.
  • 35. Conclusion • Whatever works for you at this time is fine • Don’t over-engineer, and don’t « plan for the future » • Fight « we’ve always done like this », NIH, and Hype Driven Development • Optimize for current business conditions, pay attention to TCO • Models and data matter, not infrastructure • When conditions change, move fast: smash and rebuild • ... which is what cloud is all about! • « 100% of our time spent on ML » shall be the whole of the Law • Mix and match • Train on SageMaker, deploy on ECS/EKS… or vice versa • Write your own story!
  • 37. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Technical Evangelist, AI & Machine Learning, AWS @julsimon