SlideShare a Scribd company logo
1 of 17
Download to read offline
Deploying Data Science with
Docker and AWS
Audience: Cambridge AWS Meetup Group
Presenter: Matt McDonnell, Data Scientist at Metail
Date: 9th June 2016
Context
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespoke Analysis
- Productionised Science
What?
Goal: Moving laptop analyses onto a server
Turn :
<types>run_analysis.sh<presses enter>
… analysis script retrieves data from DB, Looker, web, etc. …
… runs analysis …
… outputs results as csv, png, etc. to local hard disk …
<gets back command prompt>
Into :
Automated process running on a server
Why?
• Production scheduled task e.g. Firm Wide Metrics daily processing
• Make use of more powerful Amazon Web Services (AWS) cloud resources
for large scale analysis
• Ease of deployment for Data Science analysts
• Build consistent development environment
How?
• Containerize applications and runtime using Docker to produce images
• Store images on AWS Elastic Container Registry (ECR)
• Run images either locally, or Amazon Elastic Container Service (ECS)
• Use AWS Lambda functions to trigger scheduled tasks (or react to events)
What is Docker?
“Docker containers wrap up a piece of software in a complete
filesystem that contains everything it needs to run: code, runtime,
system tools, system libraries – anything you can install on a server. This
guarantees that it will always run the same, regardless of the
environment it is running in.” -- https://www.docker.com/what-docker
Public code: store Dockerfile on GitHub, use Travis to automatically
build image on DockerHub
Private code: private Dockerfile, build locally, push image to AWS Elastic
Container Registry
Example application: retrieve market data
PyAnalysis
Application code built on PCR image
https://github.com/mattmcd/PyAnalysis
PCR: Python Component Runtime
Base Docker image
https://github.com/mattmcd/PCR
Where? Amazon Web Services Cloud
• Elastic Container Service (ECS)
• Defines the task that runs the container
• Runs tasks on a cluster of EC2 nodes
• EC2 instance set up to act as node
• Needs to be an AWS ECS optimized AMI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
• Needs an IAM Role that has:
• AmazonEC2ContainerServiceforEC2Role policy attached
• Policies to allow access to any AWS resources needed e.g. S3
• Lambda function to trigger ECS task
• cron equivalent by using CloudWatch scheduled events
EC2 Instance Security Group
EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
EC2 Instance AMI
Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
EC2 Instance Details
Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
EC2 Instance IAM Role
Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
ECS Task
ECS task retrieves image and runs it
Lambda function
Use the lambda-canary blueprint as a basis for cron job equivalents
Lambda function
cron job equivalent via CloudWatch scheduled event
Lambda Function
Simple Lambda function to run task on ECS
Lambda function IAM role
AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
Demo / Q&A
Blog posts
• ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me)
• ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me)
Code samples
• https://github.com/mattmcd/PyAnalysis
• https://github.com/mattmcd/PCR
Docker images
• mattmcd/pyanalysis
• mattmcd/pcr
Me
• Twitter @mattmcd
• Email matt@metail.com or matt@matt-mcdonnell.com

More Related Content

What's hot

Long running aws lambda - Joel Schuweiler, Minneapolis
Long running aws lambda -  Joel Schuweiler, MinneapolisLong running aws lambda -  Joel Schuweiler, Minneapolis
Long running aws lambda - Joel Schuweiler, MinneapolisAWS Chicago
 
Kubernetes in Azure
Kubernetes in AzureKubernetes in Azure
Kubernetes in AzureKarl Ots
 
Doing Azure With PowerShell
Doing Azure With PowerShellDoing Azure With PowerShell
Doing Azure With PowerShellThomas Lee
 
Kube London May 2018
Kube London May 2018Kube London May 2018
Kube London May 2018Justin Davies
 
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 Nested Beanstalk Deployment - Brett Sutter, Minneapolis Nested Beanstalk Deployment - Brett Sutter, Minneapolis
Nested Beanstalk Deployment - Brett Sutter, MinneapolisAWS Chicago
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Amazon Web Services
 
SoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaSoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaStefan Deusch
 
Azure kubernetes service (aks) part 3
Azure kubernetes service (aks)   part 3Azure kubernetes service (aks)   part 3
Azure kubernetes service (aks) part 3Nilesh Gule
 
Walk-through: Amazon ECS
Walk-through: Amazon ECSWalk-through: Amazon ECS
Walk-through: Amazon ECSKnoldus Inc.
 
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewAWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewWyn B. Van Devanter
 
Major Container Platform Comparison
Major Container Platform ComparisonMajor Container Platform Comparison
Major Container Platform Comparisonindu Yadav
 
Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Dev_Events
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio
 
TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​Pedro Sousa
 
Deploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesDeploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesIsmaeel Enjreny
 

What's hot (20)

Long running aws lambda - Joel Schuweiler, Minneapolis
Long running aws lambda -  Joel Schuweiler, MinneapolisLong running aws lambda -  Joel Schuweiler, Minneapolis
Long running aws lambda - Joel Schuweiler, Minneapolis
 
AKS
AKSAKS
AKS
 
Azure AKS
Azure AKSAzure AKS
Azure AKS
 
Kubernetes in Azure
Kubernetes in AzureKubernetes in Azure
Kubernetes in Azure
 
Doing Azure With PowerShell
Doing Azure With PowerShellDoing Azure With PowerShell
Doing Azure With PowerShell
 
Kube London May 2018
Kube London May 2018Kube London May 2018
Kube London May 2018
 
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 Nested Beanstalk Deployment - Brett Sutter, Minneapolis Nested Beanstalk Deployment - Brett Sutter, Minneapolis
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
AWS Kinesis
AWS KinesisAWS Kinesis
AWS Kinesis
 
SoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaSoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambda
 
Azure kubernetes service (aks) part 3
Azure kubernetes service (aks)   part 3Azure kubernetes service (aks)   part 3
Azure kubernetes service (aks) part 3
 
Apache JClouds
Apache JCloudsApache JClouds
Apache JClouds
 
Walk-through: Amazon ECS
Walk-through: Amazon ECSWalk-through: Amazon ECS
Walk-through: Amazon ECS
 
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewAWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
 
Major Container Platform Comparison
Major Container Platform ComparisonMajor Container Platform Comparison
Major Container Platform Comparison
 
Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Serverless Apps with Open Whisk
Serverless Apps with Open Whisk
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.
 
TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​
 
Deploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesDeploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on Kubernetes
 

Viewers also liked

Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science MeetupDaniel Nüst
 
Agile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopAgile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopDataWorks Summit
 
PMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive SolutionsPMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive Solutionsaguazzel
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPaco Nathan
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data scienceCalvin Giles
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 
Docker for data science
Docker for data scienceDocker for data science
Docker for data scienceCalvin Giles
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 

Viewers also liked (9)

Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
Agile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopAgile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoop
 
PMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive SolutionsPMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive Solutions
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and Hadoop
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data science
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
Docker for data science
Docker for data scienceDocker for data science
Docker for data science
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 

Similar to Deploying Data Science with Docker and AWS

Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceAmazon Web Services
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
 
AWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAmazon Web Services
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and DockerKristana Kane
 
AWS Workshop 102
AWS Workshop 102AWS Workshop 102
AWS Workshop 102lynn80827
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerAmazon Web Services
 
Running your First Application on AWS
Running your First Application on AWS Running your First Application on AWS
Running your First Application on AWS Amazon Web Services
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWSAmazon Web Services
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon Web Services
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECSDeepak Kumar
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceAmazon Web Services
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at OktaJon Todd
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesAmazon Web Services
 

Similar to Deploying Data Science with Docker and AWS (20)

Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container Service
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
AWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWS
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and Docker
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
AWS Workshop 102
AWS Workshop 102AWS Workshop 102
AWS Workshop 102
 
Getting Started on AWS
Getting Started on AWSGetting Started on AWS
Getting Started on AWS
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
 
Running your First Application on AWS
Running your First Application on AWS Running your First Application on AWS
Running your First Application on AWS
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWS
 
Introduction to DevOps on AWS
Introduction to DevOps on AWSIntroduction to DevOps on AWS
Introduction to DevOps on AWS
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECS
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container Service
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at Okta
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best Practices
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Deploying Data Science with Docker and AWS

  • 1. Deploying Data Science with Docker and AWS Audience: Cambridge AWS Meetup Group Presenter: Matt McDonnell, Data Scientist at Metail Date: 9th June 2016
  • 2. Context Lots of event stream data Many AWS components Outputs: - Business Intelligence - Bespoke Analysis - Productionised Science
  • 3. What? Goal: Moving laptop analyses onto a server Turn : <types>run_analysis.sh<presses enter> … analysis script retrieves data from DB, Looker, web, etc. … … runs analysis … … outputs results as csv, png, etc. to local hard disk … <gets back command prompt> Into : Automated process running on a server
  • 4. Why? • Production scheduled task e.g. Firm Wide Metrics daily processing • Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis • Ease of deployment for Data Science analysts • Build consistent development environment How? • Containerize applications and runtime using Docker to produce images • Store images on AWS Elastic Container Registry (ECR) • Run images either locally, or Amazon Elastic Container Service (ECS) • Use AWS Lambda functions to trigger scheduled tasks (or react to events)
  • 5. What is Docker? “Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.” -- https://www.docker.com/what-docker Public code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHub Private code: private Dockerfile, build locally, push image to AWS Elastic Container Registry
  • 6. Example application: retrieve market data PyAnalysis Application code built on PCR image https://github.com/mattmcd/PyAnalysis PCR: Python Component Runtime Base Docker image https://github.com/mattmcd/PCR
  • 7. Where? Amazon Web Services Cloud • Elastic Container Service (ECS) • Defines the task that runs the container • Runs tasks on a cluster of EC2 nodes • EC2 instance set up to act as node • Needs to be an AWS ECS optimized AMI https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html • Needs an IAM Role that has: • AmazonEC2ContainerServiceforEC2Role policy attached • Policies to allow access to any AWS resources needed e.g. S3 • Lambda function to trigger ECS task • cron equivalent by using CloudWatch scheduled events
  • 8. EC2 Instance Security Group EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
  • 9. EC2 Instance AMI Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
  • 10. EC2 Instance Details Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
  • 11. EC2 Instance IAM Role Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
  • 12. ECS Task ECS task retrieves image and runs it
  • 13. Lambda function Use the lambda-canary blueprint as a basis for cron job equivalents
  • 14. Lambda function cron job equivalent via CloudWatch scheduled event
  • 15. Lambda Function Simple Lambda function to run task on ECS
  • 16. Lambda function IAM role AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
  • 17. Demo / Q&A Blog posts • ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me) • ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me) Code samples • https://github.com/mattmcd/PyAnalysis • https://github.com/mattmcd/PCR Docker images • mattmcd/pyanalysis • mattmcd/pcr Me • Twitter @mattmcd • Email matt@metail.com or matt@matt-mcdonnell.com