SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale Machine Learning from zero to
millions of users
Julien Simon
Global Technical Evangelist, AI & Machine Learning, AWS
@julsimon
Rationale
How to train ML models and deploy them in production, from
humble beginnings to world domination.
Try to take reasonable and justified steps
Use examples based on AWS services (because that’s what I
know best), but this should generalize well
Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine-
learning-from-0-to-millions-of-users-part-1-a2d36a5e849
High-level services: call an API, get the job done
Pre-trained AI services
that require no ML skills
Easily add intelligence to
your existing apps and
workflows
Quality and accuracy from
continuously-learning APIs
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
Language Forecasting Recommendations
High-level != generic
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
Language Forecasting Recommendations
Polly: custom lexicons, SSML
Transcribe: custom vocabulary
Translate: custom terminology
Comprehend: custom text classification & entity extraction
Forecast & Personalize: train on your own data!
If your business problem could be solved by a high-level service…
• Why should you go through all the trouble of building a custom solution?
• Are you really “missing features”? What’s the real business impact?
• Do you really need “more accuracy” How do you know you could reach it?
• Not sure? Run a quick PoC, it won’t take long and you’ll decide on data, not on
opinions
• Can’t do 100% of the job? Break it down into smaller pieces
• Using Rekognition to detect faces before feeding them to your CV model
• Using Textract to extract text before feeding it to your NLP model
And so it begins
• You’ve trained a model on a local machine, using a popular open source library.
• You’ve measured the model’s accuracy, and things look good. Now you’d like to
deploy it to check its actual behaviour, to run A/B tests, etc.
• You’ve embedded the model in your business application.
• You’ve deployed everything to a single Ubuntu virtual machine in the cloud.
• Everything works, you’re serving predictions, life is good!
Score card
Single EC2 instance
Infrastructure effort C’mon, it’s just one instance
ML setup effort pip install tensorflow
CI/CD integration Not needed
Build models DIY
Train models python train.py
Deploy models (at scale) python predict.py
Scale/HA inference Not needed
Optimize costs Not needed
Security Not needed
A few instances and models later…
• Life is not that good
• Too much manual work
• Time-consuming and error-prone
• Dependency hell
• No cost optimization
• Monolithic architecture
• Deployment hell
• Multiple apps can’t share the same model
• Apps and models scale differently
Use AWS-maintained tools
• Deep Learning AMI
• Deep Learning containers
Dockerize
Create a prediction service
• Model servers
• Bespoke API (Flask?)
AWS Deep Learning AMIs
Optimized environments on Amazon Linux or Ubuntu
Conda AMI
For developers who want pre-
installed pip packages of DL
frameworks in separate virtual
environments.
Base AMI
For developers who want a clean
slate to set up private DL engine
repositories or custom builds of DL
engines.
AMI with source code
For developers who want preinstalled
DL frameworks and their source code
in a shared Python environment.
Running a new EC2 instance with the Deep Learning AMI
aws ec2 run-instances 
--image-id ami-02273e0d16172dbd1  # Deep Learning AMI in eu-west-1
--instance-type p3.2xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=dlami-demo}]' 
--key-name $KEYPAIR 
--security-group-ids $SECURITY_GROUP 
--iam-instance-profile Name=$ROLE
Connecting to Jupyter
On your local machine
ssh -L 8000:localhost:8888 ec2-user@INSTANCE_NAME
On the EC2 instance
jupyter notebook --no-browser --port=8888
On your local machine
Open http://localhost:8000
Training with the Tensorflow Deep Learning container
On the training machine
$(aws ecr get-login --no-include-email --region eu-west-1 --registry-ids 763104351884)
docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-
horovod-gpu-py27-cu100-ubuntu16.04
nvidia-docker run -it 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-
training:1.13-horovod-gpu-py27-cu100-ubuntu16.04
In the container
git clone https://github.com/fchollet/keras.git
python keras/examples/mnist_cnn.py
List of image names: https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html
Scaling alert!
• More customers, more team members, more models, woohoo!
• Scalability, high availability & security are now a thing
• Scaling up is a losing proposition. You need to scale out
• Only automation can save you: IaC, CI/CD and all that good DevOps stuff
• What are your options?
Option 1: virtual machines
• Definitely possible, but:
• Why? Seriously, I want to know.
• Operational and financial issues await if you don’t automate extensively
• Training
• Build on-demand clusters with CloudFormation, Terraform, etc.
• Distributed training is a pain to set up
• Prediction
• Automate deployement with CI/CD
• Scale with Auto Scaling, Load Balancers, etc.
• Spot, spot, spot
Score card
More EC2 instances
Infrastructure effort Lots
ML setup effort Some (DL AMI)
CI/CD integration No change
Build models DIY
Train models DIY
Deploy models DIY (model servers)
Scale/HA inference DIY (Auto Scaling, LB)
Optimize costs DIY (Spot, automation)
Security DIY (IAM, VPC, KMS)
Option 2: Docker clusters
• This makes a lot of sense if you’re already deploying apps to Docker
• No change to the dev experience: same workflows, same CI/CD, etc.
• Deploy prediction services on the same infrastructure as business apps.
• Amazon ECS and Amazon EKS
• Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc.
• Both come with AWS-maintained AMIs that will save you time
• One cluster or many clusters ?
• Build on-demand development and test clusters with CloudFormation, Terraform, etc.
• Many customers find that running a large single production cluster works better
• Still instance-based and not fully-managed
• Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do
• And yes, this matters even if « someone else is taking care of clusters »
Creating an ECS cluster and adding instances
aws ecs create-cluster --cluster-name ecs-demo
# Add 4 p2.xlarge spot instances, ECS-optimized AMI with GPU support, default VPC
aws ec2 run-instances --image-id ami-0638eba79fcfe776e 
--count 4 
--instance-type p2.xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ecs-demo}]’
--key-name $KEYPAIR 
--security-group-ids $SECURITY_GROUP 
--iam-instance-profile Name=$ROLE
--user-data file://user-data.txt
# Add 2 c5.2xlarge, ECS-optimized AMI, default VPC, different subnet
aws ec2 run-instances --image-id ami-09cd8db92c6bf3a84 
--count 2 
--instance-type c5.2xlarge 
--instance-market-options '{"MarketType":"spot"}' 
--subnet $SUBNET_ID 
. . .
Defining the training task
"containerDefinitions": [{
"command": [
“git clone https://github.com/fchollet/keras.git && python keras/examples/mnist_cnn.py”],
"entryPoint": [ "sh","-c"],
"name": "TFconsole",
"image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-horovod-gpu-py36-
cu100-ubuntu16.04",
"memory": 4096,
"cpu": 256,
"resourceRequirements" : [ {"type" : "GPU”, "value" : "1”} ],
. . .
Defining the inference task
"containerDefinitions": [{
"command": [
“git clone -b r1.13 https://github.com/tensorflow/serving.git && tensorflow_model_server
--port=8500 --rest_api_port=8501 --model_name=<MODEL_NAME> --model_base_path=<MODEL_PATH>”],
"entryPoint": [ "sh","-c"],
"name": "TFinference",
"image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.13-cpu-py36-
ubuntu16.04"",
"memory": 4096,
"cpu": 256,
"portMappings": [{ "hostPort": 8500, "protocol": "tcp”, "containerPort": 8500},
{ "hostPort": 8501, "protocol": "tcp”, "containerPort": 8501},
. . .
Running training and inference on the cluster
# Create task definitions for training and inference
aws ecs register-task-definition --cli-input-json file://training.json
aws ecs register-task-definition --cli-input-json file://inference.json
# Run 4 training tasks (the GPU requirement is in the task definition)
aws ecs run-task --cluster ecs-demo --task-definition training:1 --count 4
# Create inference service, starting with 1 initial task
# Run it on c5 instance, and spread tasks evenly
aws ecs create-service --cluster ecs-demo 
--service-name inference-cpu 
--task-definition inference:1 
--desired-count 1 
--placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ c5.*" 
--placement-strategy field="instanceId",type="spread"
# Scale inference service to 2 tasks
aws ecs update-service --cluster ecs-demo --service inference-cpu --desired-count 2
Score card
EC2 ECS / EKS
Infrastructure effort Lots Some (Docker tools)
ML setup effort Some (DL AMI) Some (DL containers)
CI/CD integration No change No change
Build models DIY DIY
Train models (at scale) DIY DIY (Docker tools)
Deploy models (at scale) DIY (model servers) DIY (Docker tools)
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.)
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation)
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS)
Option 3: go fully managed with Amazon SageMaker
1
2
3
Model options on Amazon SageMaker
Training code
Factorization Machines
Linear Learner
Principal Component Analysis
K-Means Clustering
XGBoost
And more
Built-in Algorithms (17)
No ML coding required
No infrastructure work required
Distributed training
Pipe mode
Bring Your Own Container
Full control, run anything!
R, C++, etc.
No infrastructure work required
Built-in Frameworks
Bring your own code: script mode
Open source containers
No infrastructure work required
Distributed training
Pipe mode
The Amazon SageMaker API
• Python SDK orchestrating all Amazon SageMaker activity
• High-level objects for algorithm selection, training, deploying,
automatic model tuning, etc.
• Spark SDK (Python & Scala)
• AWS SDK
• For scripting and automation
• CLI : ‘aws sagemaker’
• Language SDKs: boto3, etc.
Training and deploying
tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
role=role,
train_instance_count=1,
train_instance_type='ml.c5.2xlarge’,
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 10,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by a single instance
tf_endpoint = tf_estimator.deploy(initial_instance_count=1,instance_type=ml.t3.xlarge)
tf_endpoint.predict(…)
Training and deploying, at any scale
tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py',
role=role,
train_instance_count=8,
train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 200,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by 16 multi-AZ load-balanced instances
tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge)
tf_endpoint.predict(…)
https://gitlab.com/juliensimon/dlnotebooks/blob/master/sagemaker/08-Image-classification-advanced.ipynb
Score card
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DL AMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training,
Auto Scaling for inference
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
Score card
Flamewarin3,2,1…
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DL AMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training,
Auto Scaling for inference
Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
Personal opinion Small scale only, unless you have
strong DevOps skills and enjoy
exercising them.
Reasonable choice if you’re a
Docker shop and know how to use
the rich Docker ecosystem. If not,
I’d think twice: Docker isn’t an ML
platform.
Learn it in a few hours, forget
about servers, focus 100% on
ML, enjoy goodies like pipe
mode, distributed training,
HPO, inference pipelines and
more.
Conclusion
• Whatever works for you at this time is fine
• Don’t over-engineer, and don’t « plan for the future »
• Fight « we’ve always done like this », NIH, and Hype Driven Development
• Optimize for current business conditions, pay attention to TCO
• Models and data matter, not infrastructure
• When conditions change, move fast: smash and rebuild
• ... which is what cloud is all about!
• « Spending 100% of our time on ML » should drive infrastructure design
• Mix and match
• Train on SageMaker, deploy on ECS/EKS… or vice versa
• Write your own story!
Getting started
http://aws.amazon.com/free
https://aws.ai
https://aws.amazon.com/machine-learning/amis/
https://aws.amazon.com/machine-learning/containers/
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/awslabs/amazon-sagemaker-examples
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlcontainers DL AMI / container demos
https://gitlab.com/juliensimon/dlnotebooks SageMaker notebooks
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Technical Evangelist, AI & Machine Learning, AWS
@julsimon

Más contenido relacionado

La actualidad más candente

Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Julien SIMON
 
Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfAmazon Web Services
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfAmazon Web Services
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowJulien SIMON
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseAmazon Web Services
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Julien SIMON
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...Amazon Web Services Korea
 
Demystifying Machine Learning on AWS
Demystifying Machine Learning on AWSDemystifying Machine Learning on AWS
Demystifying Machine Learning on AWSAmazon Web Services
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)Julien SIMON
 
Build Deep Learning Applications with TensorFlow and Amazon SageMaker
Build Deep Learning Applications with TensorFlow and Amazon SageMakerBuild Deep Learning Applications with TensorFlow and Amazon SageMaker
Build Deep Learning Applications with TensorFlow and Amazon SageMakerAmazon Web Services
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingAmazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...Amazon Web Services Korea
 
Integrating Deep Learning Into Your Enterprise
Integrating Deep Learning Into Your EnterpriseIntegrating Deep Learning Into Your Enterprise
Integrating Deep Learning Into Your EnterpriseAmazon Web Services
 
New AI/ML services at AWS re:Invent 2017
New AI/ML services at AWS re:Invent 2017New AI/ML services at AWS re:Invent 2017
New AI/ML services at AWS re:Invent 2017Julien SIMON
 

La actualidad más candente (20)

Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
 
Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdf
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your Enterprise
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...
Deep Learning 모델의 효과적인 분산 트레이닝과 모델 최적화 방법 - 김무현 데이터 사이언티스트, AWS :: AWS Summit...
 
Demystifying Machine Learning on AWS
Demystifying Machine Learning on AWSDemystifying Machine Learning on AWS
Demystifying Machine Learning on AWS
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)
 
Build Deep Learning Applications with TensorFlow and Amazon SageMaker
Build Deep Learning Applications with TensorFlow and Amazon SageMakerBuild Deep Learning Applications with TensorFlow and Amazon SageMaker
Build Deep Learning Applications with TensorFlow and Amazon SageMaker
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model Training
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
 
Integrating Deep Learning Into Your Enterprise
Integrating Deep Learning Into Your EnterpriseIntegrating Deep Learning Into Your Enterprise
Integrating Deep Learning Into Your Enterprise
 
New AI/ML services at AWS re:Invent 2017
New AI/ML services at AWS re:Invent 2017New AI/ML services at AWS re:Invent 2017
New AI/ML services at AWS re:Invent 2017
 

Similar a Scaling Machine Learning from zero to millions of users (May 2019)

"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
 
Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Fwdays
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik
 
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用Amazon Web Services
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...Amazon Web Services
 
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppers
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppersCloud Computing: New era in ML. Gianluigi Mucciolo - XPeppers
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppersData Driven Innovation
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
Simplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson StudioSimplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson StudioDataWorks Summit
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksIntegrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksAmazon Web Services
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)AZUG FR
 
Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Brennan Saeta
 

Similar a Scaling Machine Learning from zero to millions of users (May 2019) (20)

"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用
Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
 
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppers
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppersCloud Computing: New era in ML. Gianluigi Mucciolo - XPeppers
Cloud Computing: New era in ML. Gianluigi Mucciolo - XPeppers
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Simplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson StudioSimplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson Studio
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Azure ML Studio
Azure ML StudioAzure ML Studio
Azure ML Studio
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an Hour
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
Machine Learning at the Edge
Machine Learning at the EdgeMachine Learning at the Edge
Machine Learning at the Edge
 
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksIntegrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...
 

Más de Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Julien SIMON
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Julien SIMON
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Julien SIMON
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Julien SIMON
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Julien SIMON
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Julien SIMON
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Julien SIMON
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 

Más de Julien SIMON (18)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Scaling Machine Learning from zero to millions of users (May 2019)

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale Machine Learning from zero to millions of users Julien Simon Global Technical Evangelist, AI & Machine Learning, AWS @julsimon
  • 2. Rationale How to train ML models and deploy them in production, from humble beginnings to world domination. Try to take reasonable and justified steps Use examples based on AWS services (because that’s what I know best), but this should generalize well Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine- learning-from-0-to-millions-of-users-part-1-a2d36a5e849
  • 3.
  • 4. High-level services: call an API, get the job done Pre-trained AI services that require no ML skills Easily add intelligence to your existing apps and workflows Quality and accuracy from continuously-learning APIs R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots F O R E C A S TT E X T R A C T P E R S O N A L I Z E Language Forecasting Recommendations
  • 5. High-level != generic R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots F O R E C A S TT E X T R A C T P E R S O N A L I Z E Language Forecasting Recommendations Polly: custom lexicons, SSML Transcribe: custom vocabulary Translate: custom terminology Comprehend: custom text classification & entity extraction Forecast & Personalize: train on your own data!
  • 6. If your business problem could be solved by a high-level service… • Why should you go through all the trouble of building a custom solution? • Are you really “missing features”? What’s the real business impact? • Do you really need “more accuracy” How do you know you could reach it? • Not sure? Run a quick PoC, it won’t take long and you’ll decide on data, not on opinions • Can’t do 100% of the job? Break it down into smaller pieces • Using Rekognition to detect faces before feeding them to your CV model • Using Textract to extract text before feeding it to your NLP model
  • 7.
  • 8.
  • 9. And so it begins • You’ve trained a model on a local machine, using a popular open source library. • You’ve measured the model’s accuracy, and things look good. Now you’d like to deploy it to check its actual behaviour, to run A/B tests, etc. • You’ve embedded the model in your business application. • You’ve deployed everything to a single Ubuntu virtual machine in the cloud. • Everything works, you’re serving predictions, life is good!
  • 10. Score card Single EC2 instance Infrastructure effort C’mon, it’s just one instance ML setup effort pip install tensorflow CI/CD integration Not needed Build models DIY Train models python train.py Deploy models (at scale) python predict.py Scale/HA inference Not needed Optimize costs Not needed Security Not needed
  • 11.
  • 12. A few instances and models later… • Life is not that good • Too much manual work • Time-consuming and error-prone • Dependency hell • No cost optimization • Monolithic architecture • Deployment hell • Multiple apps can’t share the same model • Apps and models scale differently Use AWS-maintained tools • Deep Learning AMI • Deep Learning containers Dockerize Create a prediction service • Model servers • Bespoke API (Flask?)
  • 13. AWS Deep Learning AMIs Optimized environments on Amazon Linux or Ubuntu Conda AMI For developers who want pre- installed pip packages of DL frameworks in separate virtual environments. Base AMI For developers who want a clean slate to set up private DL engine repositories or custom builds of DL engines. AMI with source code For developers who want preinstalled DL frameworks and their source code in a shared Python environment.
  • 14.
  • 15. Running a new EC2 instance with the Deep Learning AMI aws ec2 run-instances --image-id ami-02273e0d16172dbd1 # Deep Learning AMI in eu-west-1 --instance-type p3.2xlarge --instance-market-options '{"MarketType":"spot"}' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=dlami-demo}]' --key-name $KEYPAIR --security-group-ids $SECURITY_GROUP --iam-instance-profile Name=$ROLE
  • 16. Connecting to Jupyter On your local machine ssh -L 8000:localhost:8888 ec2-user@INSTANCE_NAME On the EC2 instance jupyter notebook --no-browser --port=8888 On your local machine Open http://localhost:8000
  • 17. Training with the Tensorflow Deep Learning container On the training machine $(aws ecr get-login --no-include-email --region eu-west-1 --registry-ids 763104351884) docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13- horovod-gpu-py27-cu100-ubuntu16.04 nvidia-docker run -it 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow- training:1.13-horovod-gpu-py27-cu100-ubuntu16.04 In the container git clone https://github.com/fchollet/keras.git python keras/examples/mnist_cnn.py List of image names: https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html
  • 18.
  • 19. Scaling alert! • More customers, more team members, more models, woohoo! • Scalability, high availability & security are now a thing • Scaling up is a losing proposition. You need to scale out • Only automation can save you: IaC, CI/CD and all that good DevOps stuff • What are your options?
  • 20. Option 1: virtual machines • Definitely possible, but: • Why? Seriously, I want to know. • Operational and financial issues await if you don’t automate extensively • Training • Build on-demand clusters with CloudFormation, Terraform, etc. • Distributed training is a pain to set up • Prediction • Automate deployement with CI/CD • Scale with Auto Scaling, Load Balancers, etc. • Spot, spot, spot
  • 21. Score card More EC2 instances Infrastructure effort Lots ML setup effort Some (DL AMI) CI/CD integration No change Build models DIY Train models DIY Deploy models DIY (model servers) Scale/HA inference DIY (Auto Scaling, LB) Optimize costs DIY (Spot, automation) Security DIY (IAM, VPC, KMS)
  • 22. Option 2: Docker clusters • This makes a lot of sense if you’re already deploying apps to Docker • No change to the dev experience: same workflows, same CI/CD, etc. • Deploy prediction services on the same infrastructure as business apps. • Amazon ECS and Amazon EKS • Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc. • Both come with AWS-maintained AMIs that will save you time • One cluster or many clusters ? • Build on-demand development and test clusters with CloudFormation, Terraform, etc. • Many customers find that running a large single production cluster works better • Still instance-based and not fully-managed • Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do • And yes, this matters even if « someone else is taking care of clusters »
  • 23.
  • 24. Creating an ECS cluster and adding instances aws ecs create-cluster --cluster-name ecs-demo # Add 4 p2.xlarge spot instances, ECS-optimized AMI with GPU support, default VPC aws ec2 run-instances --image-id ami-0638eba79fcfe776e --count 4 --instance-type p2.xlarge --instance-market-options '{"MarketType":"spot"}' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ecs-demo}]’ --key-name $KEYPAIR --security-group-ids $SECURITY_GROUP --iam-instance-profile Name=$ROLE --user-data file://user-data.txt # Add 2 c5.2xlarge, ECS-optimized AMI, default VPC, different subnet aws ec2 run-instances --image-id ami-09cd8db92c6bf3a84 --count 2 --instance-type c5.2xlarge --instance-market-options '{"MarketType":"spot"}' --subnet $SUBNET_ID . . .
  • 25. Defining the training task "containerDefinitions": [{ "command": [ “git clone https://github.com/fchollet/keras.git && python keras/examples/mnist_cnn.py”], "entryPoint": [ "sh","-c"], "name": "TFconsole", "image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:1.13-horovod-gpu-py36- cu100-ubuntu16.04", "memory": 4096, "cpu": 256, "resourceRequirements" : [ {"type" : "GPU”, "value" : "1”} ], . . .
  • 26. Defining the inference task "containerDefinitions": [{ "command": [ “git clone -b r1.13 https://github.com/tensorflow/serving.git && tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=<MODEL_NAME> --model_base_path=<MODEL_PATH>”], "entryPoint": [ "sh","-c"], "name": "TFinference", "image": "763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.13-cpu-py36- ubuntu16.04"", "memory": 4096, "cpu": 256, "portMappings": [{ "hostPort": 8500, "protocol": "tcp”, "containerPort": 8500}, { "hostPort": 8501, "protocol": "tcp”, "containerPort": 8501}, . . .
  • 27. Running training and inference on the cluster # Create task definitions for training and inference aws ecs register-task-definition --cli-input-json file://training.json aws ecs register-task-definition --cli-input-json file://inference.json # Run 4 training tasks (the GPU requirement is in the task definition) aws ecs run-task --cluster ecs-demo --task-definition training:1 --count 4 # Create inference service, starting with 1 initial task # Run it on c5 instance, and spread tasks evenly aws ecs create-service --cluster ecs-demo --service-name inference-cpu --task-definition inference:1 --desired-count 1 --placement-constraints type="memberOf",expression="attribute:ecs.instance-type =~ c5.*" --placement-strategy field="instanceId",type="spread" # Scale inference service to 2 tasks aws ecs update-service --cluster ecs-demo --service inference-cpu --desired-count 2
  • 28. Score card EC2 ECS / EKS Infrastructure effort Lots Some (Docker tools) ML setup effort Some (DL AMI) Some (DL containers) CI/CD integration No change No change Build models DIY DIY Train models (at scale) DIY DIY (Docker tools) Deploy models (at scale) DIY (model servers) DIY (Docker tools) Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS)
  • 29. Option 3: go fully managed with Amazon SageMaker 1 2 3
  • 30. Model options on Amazon SageMaker Training code Factorization Machines Linear Learner Principal Component Analysis K-Means Clustering XGBoost And more Built-in Algorithms (17) No ML coding required No infrastructure work required Distributed training Pipe mode Bring Your Own Container Full control, run anything! R, C++, etc. No infrastructure work required Built-in Frameworks Bring your own code: script mode Open source containers No infrastructure work required Distributed training Pipe mode
  • 31. The Amazon SageMaker API • Python SDK orchestrating all Amazon SageMaker activity • High-level objects for algorithm selection, training, deploying, automatic model tuning, etc. • Spark SDK (Python & Scala) • AWS SDK • For scripting and automation • CLI : ‘aws sagemaker’ • Language SDKs: boto3, etc.
  • 32. Training and deploying tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py', role=role, train_instance_count=1, train_instance_type='ml.c5.2xlarge’, framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 10, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by a single instance tf_endpoint = tf_estimator.deploy(initial_instance_count=1,instance_type=ml.t3.xlarge) tf_endpoint.predict(…)
  • 33. Training and deploying, at any scale tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py', role=role, train_instance_count=8, train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 200, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by 16 multi-AZ load-balanced instances tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge) tf_endpoint.predict(…)
  • 35. Score card EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DL AMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training, Auto Scaling for inference Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters
  • 36. Score card Flamewarin3,2,1… EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DL AMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) 1 LOCs Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand training, Auto Scaling for inference Security DIY (IAM, VPC, KMS) DIY (IAM, VPC, KMS) API parameters Personal opinion Small scale only, unless you have strong DevOps skills and enjoy exercising them. Reasonable choice if you’re a Docker shop and know how to use the rich Docker ecosystem. If not, I’d think twice: Docker isn’t an ML platform. Learn it in a few hours, forget about servers, focus 100% on ML, enjoy goodies like pipe mode, distributed training, HPO, inference pipelines and more.
  • 37. Conclusion • Whatever works for you at this time is fine • Don’t over-engineer, and don’t « plan for the future » • Fight « we’ve always done like this », NIH, and Hype Driven Development • Optimize for current business conditions, pay attention to TCO • Models and data matter, not infrastructure • When conditions change, move fast: smash and rebuild • ... which is what cloud is all about! • « Spending 100% of our time on ML » should drive infrastructure design • Mix and match • Train on SageMaker, deploy on ECS/EKS… or vice versa • Write your own story!
  • 39. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Technical Evangelist, AI & Machine Learning, AWS @julsimon