Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Continuous Integration with Amazon ECS and Docker

1.576 visualizaciones

Publicado el

One of the most fundamental challenges of CI/CD is the ability to balance between Quality, Time, and Cost.  Amazon EC2 Container Service (ECS), along with Docker and Amazon EC2 Container Registry (ECR), has changed the game for many by making resource management very simple.  For Okta, it has enabled the Continuous Integration team to maximize throughput while minimizing cost.  In this session we will show you how Okta has created a flexible CI system with ECS, Docker, ECR, AWS Lambda, AWS CloudFormation, Amazon RDS, and Amazon SQS.  Okta runs 30,000 tests with each developer commit, and releases 10,000 new lines of code each week to production.  The CI system, built 100% on AWS, must be able to handle load while keeping cost under control.  This talk is oriented toward developers looking to achieve efficient resource and cost management without compromising speed or quality.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

Continuous Integration with Amazon ECS and Docker

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tim Secor - Manager, Developer Productivity 8/11/2016 Continuous Integration with ECS and Docker
  2. 2. Topics • Who is Okta • Okta Engineering—How Do We work, how do we ship our code? • The Challenge of the Developer Productivity Team • A CI System with Amazon EC2 Container Service and Docker
  3. 3. Okta: Connect Everything • Connects all users, devices, applications, and organizations • SSO, Adaptive MFA, Provisioning, Universal Directory, Mobility • The broadest and deepest application network Leader: Okta Magic Quadrant Leader: Okta Forrester Wave What We Do We believe that connecting everything will make organizations more productive and more secure. What We Believe We Make Customers Successful
  4. 4. © Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Millions of people use Okta every dayMillions of people use Okta every day
  5. 5. © Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 5 Thousands of enterprises use Okta to connect to Adobe’s Creative Cloud jim@designer.com
  6. 6. © Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 6 Thousands of Enterprise Customers Ed, Gov, Non-Profit Services Media ConsumerTechnology Manufacturing, Energy FinanceCloudHealth
  7. 7. © Okta and/or its affiliates. All rights reserved. Okta Confidential 7© Okta and/or its affiliates. All rights reserved. Okta Confidential 7 Okta Application Network Mobility Management Single Sign On Adaptive MFA Provisioning Universal Directory Extensible Profiles, Attribute Transformations, Directory Integration and AD Password Management Secure SSO for All Your Web Apps, On-prem and Cloud, with Flexible Policy, from Any Device Contextual Access Policies, Modern Factors, Adaptive Authentication, Integrations for Apps and VPNs Lifecycle Management, Cloud & On-prem App Integration, Mastering from Apps, Directory Provisioning, Rules, Workflow, Reporting Tight User Identity Integration, Device Based Contextual Access, Light-weight Management Okta IT & Platform products
  8. 8. © Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 8 The most reliable IDaaS available Never taken offline for upgrades Redundant and scalable A B C A B C DC2 DC1 okta.com/trust A Platform Architecture For Scale DATA TIER A B C LOAD BALANCERS APP SERVERS
  9. 9. © Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential© Okta and/or its affiliates. All rights reserved. Okta Confidential 9 Global Datacenters
  10. 10. Engineering
  11. 11. Okta Engineering—How Do We work, how do we ship our code? • 200 engineers, split into teams with embedded specialists • 1 week sprints, and deploy to production weekly • Capability to do more than one hotfix per day at customers’ request or for bugs found in CI or pre-prod • Every merge to master is a potential release candidate
  12. 12. Okta Engineering—How Do We Test Our Code? • Every topic branch goes through the same amount of vigor in testing as release candidate. • Passing automated tests is enforced at commit time. • Largest repo: 30K tests, takes 60 minutes (22 parallel runs) • Smallest repo: 100 tests, 5 minutes • The Developer Productivity team is responsible for supporting engineering.
  13. 13. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud First
  14. 14. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud First Developers expect fast turn- around time and reliable results.
  15. 15. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud First We need to run all the tests required to guarantee quality.
  16. 16. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud First We need to run an infrastructure which is as cost- effective as possible
  17. 17. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud First We aim to use cloud services first, wherever possible
  18. 18. Problems
  19. 19. CI using Open Source, Monolithic Applications
  20. 20. Vision
  21. 21. Vision • Clean testing environments • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security
  22. 22. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Isolate test environments from others, parallel and serial runs
  23. 23. Vision • Clean testing environments • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Workers should survive the loss of their build server Worker pool should scale quickly Number of workers should not affect memory footprint of build server
  24. 24. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Run our services for cheaper rates, as we have many short lived tasks, and could certainly handle a few failures
  25. 25. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Enable testing of infrastructure changes in topic branches
  26. 26. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Should survive build server reboots Shouldn’t be tied to specific workers or build servers Centralized Should have good visibility Re-queuing of lost tasks
  27. 27. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Push testing and creation of test machines to developers
  28. 28. Vision • Clean testing environment • Dynamic worker scaling • Spot instances for cost • Versioned Testing • Improved queuing system • Less Infrastructure Flakiness • The correct privileges, to maintain security Launch tasks in secure environments
  29. 29. Solutions
  30. 30. EC2 Container Service and Docker • Amazon Web Services + Java app tailored to Okta process • Immutable and Disposable build workers—created for one-time use, destroyed when job is done • Near ZERO cost on weekends, scales with load • EC2 Container Service allows us to maximize usage of EC2 instances • Same containers for multiple types and numbers of builds • Same Machine Image can run multiple docker images
  31. 31. Custom Reporting
  32. 32. Docker • http://www.docker.com/what-docker#/VM
  33. 33. Docker Update • Update Dockerfile and our CI system builds the new image, uploading it to our repository • Update task definition for cluster updates
  34. 34. Dockerfile FROM docker.aue1d.saasure.com/okta-base:2.0 MAINTAINER Okta RUN useradd -d /home/container_user -m -s /bin/bash container_user # Install wget, tar, hostname RUN yum install -y wget tar hostname # Install Java 8 RUN yum install -y java-1.8.0-oracle-1.8.0_31 RUN mkdir -p /opt/sage RUN mkdir -p /var/log/sage RUN chown container_user /var/log/sage ADD conf/* /opt/sage/conf/ ADD core/target/core-*.jar /opt/sage/sage.jar EXPOSE 8882 8883 USER container_user CMD java $OKTA_SAGE_JAVA_ARGS -jar /opt/sage/sage.jar server /opt/sage/conf/sage.yml
  35. 35. Docker Security Conventions Container repository • Only allow containers from internal repository Security scanning of containers - JFrog Xray Process monitoring on docker host – cAdvisor from google Secrets or any form of config NEVER baked in containers Start from minimal, audited base OS Run container as non-privileged user w/ user namespaces Docker 1.10+ Monitor alas.aws.amazon.com for critical updates
  36. 36. Docker Source Conventions 3 categories of container definitions 1. “Library” definitions used as the basis for building other images 2. Third-party service definitions e.g. Zookeeper or Elasticsearch 3. Internal service definitions Repo per internal service • Dockerfile in same repo => image versioned with code • Docker compose for running dependent services • Pegged versions (no builds) Single repo for library and third-party service definitions
  37. 37. Docker Build Conventions Integration tests run against code running in container Build owns creating immutable version and publishing to artifact server Strict rules around “FROM” clause • Must point at internal artifact server • Must be tagged following SEMVER-SHORT_SHA convention • Never allow missing or use of “latest” tag for repeatable builds
  38. 38. Docker Build Process
  39. 39. © Okta and/or its affiliates. All rights reserved. Logging and monitoring • Logging • All output streams pipe to STDOUT/STDERR of the running process • Log forwarding is provided by underlying host • Log entries contain • Host • Container Id • Image name & version • Request Id • Metrics • Host level, generic container metrics provided by host • App level metrics published directly to well defined endpoints
  40. 40. Amazon EC2 Container Service • ECS Under The Hood
  41. 41. Amazon EC2 Container Service Host Management Userdata installs: • Slave terminator – T-800 • Base docker images an option • Credentials – from s3 • Splunk Forwarder – logging • Cluster target • Cache – code and libs
  42. 42. Amazon EC2 Container Service Identity and Access Management separation per service • Either service per cluster or use new Identity and Access Management for Elastic Container Service functionality Sharing the docker daemon to allow running docker within docker Pre-fetching large data blobs and making them available on the hosts is an option Multiple containers: mysql, redis, kinesilite
  43. 43. Task Definitions { "taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box- task:1", "containerDefinitions": [ { "memory": 15000, "essential": true, "mountPoints": [ { "containerPath": "/usr/bin/docker", "sourceVolume": "docker_daemon", "readOnly": null }, { "containerPath": "/var/run/docker.sock", "sourceVolume": "docker_socket", "readOnly": null }
  44. 44. Task Definitions ], } ], "volumes": [ { "host": { "sourcePath": "/var/run/docker.sock" }, "name": "docker_socket" }, { "host": { "sourcePath": "/usr/bin/docker" }, "name": "docker_daemon" } ], "family": "base-container-box-task”
  45. 45. Clean Testing Environments • Docker images • Nearly instant machine refresh • Easy for users to create and upload images that have been tested to work locally • Efficient Machine use • Amazon EC2 Container Service with EC2 Container Repository and private repository backend
  46. 46. Docker Start Up Docker Start Up
  47. 47. Dynamic Worker Scaling Simple Queue Service LambdaSimple Notification Service Lambda Scaling Bin Packing EC2 Container Service
  48. 48. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use EC2 Container Service for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle VS
  49. 49. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use EC2 Container Service for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle VS
  50. 50. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use EC2 Container Service for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle VS
  51. 51. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use EC2 Container Service for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle VS
  52. 52. Dynamic Worker Scaling` Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use EC2 Container Service for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster allowing themselves to self terminate when they are idle VS
  53. 53. Dynamic Worker Scaling
  54. 54. Spot Instances
  55. 55. Spot Instances
  56. 56. Spot Instances
  57. 57. Versioned Jobs Scripts checked into repositories Makes a transition to Docker jobs easy
  58. 58. Versioned Jobs With EC2 Container Service • Versioned build and test scripts can now be run in versioned docker containers, using versioned task definitions • Creates extreme flexibility • Cloud formation allows us to stand up whole new clusters with all different versions in a matter of minutes for long term testing
  59. 59. EC2 Container Service + Docker Problems • Docker containers not launching • EC2 Container Service agent failing • Docker containers stopping • Incompatibility with certain services • Docker OS availability • Cleanup • Image size
  60. 60. © Okta and/or its affiliates. All rights reserved. • Elastic Load Balancer • Dynamic port mapping to containers • Fail health based on HTTP return code • Different health endpoint for adding vs removing • Bin packing scheduler • Could provide better cost management reporting and tools • Ability to mark container instances as un-schedulable • Remove sharp edges around the stopped state • Give Auto Scaling Groups ability to set Elastic Compute Cloud instance ”shutdown behavior” • Periodic cleanup process in Elastic Container Service to deregister stopped instances EC2 Container Service Feature Requests
  61. 61. © Okta and/or its affiliates. All rights reserved. • /etc/ecs/ecs.config • ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr) • ECS_LOGLEVEL=debug • Beware of running services in same cluster that use the same ports • Tune Elastic Load Balancer health check • Docker 1.10 for security enhancements • Canary & Blue/Green separate service attached to same Elastic Load Balancer • Rollback is trivial • Elastic Container Service is incredibly easy to get up and running • The ecosystem is changing quickly, we are moving cautiously • Holding off on stateful services in Docker EC2 Container Service Takeaways
  62. 62. Amazon Web Services Elastic Compute Cloud Simple Queue Service LambdaEC2 Container Service Simple Storage Service Relational Database Service Kinesis EC2 Spot Instances EC2 Container Registry CloudFormation Simple Notification Service CloudWatch CloudTrail
  63. 63. Building CI with Amazon Web Services
  64. 64. Future
  65. 65. Expand Use • Use EC2 Container Service for more services • Allow Developers to control their test suites and Docker images more directly • Developer Environments • Use docker for local long running services • Use a VM running the same version OS • Remote updates to keep it in line with CI • Aim to enable running CI containers right out of the box
  66. 66. Result: Happy Engineering Team • Developers can write more tests quicker. • Happy devs, timely build/test status feedback. • Happy quality team, all tests are run at each commit. • Happy ops team, release candidate produced quickly. • Happy management, infra budget is under control.
  67. 67. Thank You Join us @Okta - www.okta.com/company/careers/ stackshare.io/okta/okta
  68. 68. Remember to complete your evaluations!

×