Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Netflix and Open Source


Eche un vistazo a continuación

1 de 59 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (17)


Similares a Netflix and Open Source (20)

Más de Adrian Cockcroft (12)


Más reciente (20)

Netflix and Open Source

  1. 1. Netflix and Open Source March 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS
  2. 2. Cloud Native NetflixOSS – Cloud Native On-Ramp Netflix Open Source Cloud Prize
  3. 3. Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
  4. 4. How Netflix Streaming Works Consumer Electronics User Data Web Site or AWS Cloud Discovery API Services Personalization CDN Edge Locations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding
  5. 5. Content Delivery Service Open Source Hardware Design + FreeBSD, bird, nginx
  6. 6. November 2012 Traffic
  7. 7. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three Cassandra AWS zones memcached Web service Start Here S3 bucket Three Personalization movie group choosers (for US, Canada and Latam)
  8. 8. Cloud Native Architecture Clients Things Autoscaled Micro JVM JVM JVM Services Autoscaled Micro JVM JVM Memcached Services Distributed Quorum Cassandra Cassandra Cassandra NoSQL Datastores Zone A Zone B Zone C
  9. 9. Non-Native Cloud Architecture Agile Mobile iOS/Android Mammals Cloudy App Servers Buffer Datacenter MySQL Legacy Apps Dinosaurs
  10. 10. New Anti-Fragile Patterns Micro-services Chaos engines Highly available systems composed from ephemeral components
  11. 11. Stateless Micro-Service Architecture Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7) memcached, non-java apps AppDynamics Monitoring appagent monitoring Tomcat Log rotation Application war file, base Healthcheck, status to S3 GC and thread servlet, platform, client servlets, JMX interface, AppDynamics dump logging interface jars, Astyanax Servo autoscale machineagent Epic/Atlas
  12. 12. Cassandra Instance Architecture Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Java (JDK 7) Healthcheck, Status AppDynamics appagent monitoring Cassandra Server Monitoring AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk GC and thread holding Commit log and SSTables machineagent dump logging Epic/Atlas
  13. 13. Configuration State Management Datacenter CMDB’s woeful Cloud native is the solution Dependably complete
  14. 14. Edda – Configuration History Eureka Services metadata AWS AppDynamics Instances, Request flow ASGs, etc. Edda Monkeys
  15. 15. Edda Query Examples Find any instances that have ever had a specific public IP address $ curl "http://edda/api/v2/view/instances;publicIpAddress=;_since=0" ["i-0123456789","i-012345678a","i-012345678b”] Show the most recent change to a security group $ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2" --- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810 +++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504 @@ -1,33 +1,33 @@ { … "ipRanges" : [ "", "", + "", - "" … }
  16. 16. Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
  17. 17. Scalability Demands
  18. 18. Asgard
  19. 19. Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB) Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0
  20. 20. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
  21. 21. Leveraging Public Scale 1,000 Instances 100,000 Instances Grey Public Private Area Startups Netflix Google
  22. 22. How big is Public? AWS Maximum Possible Instance Count 3.7 Million Growth >10x in Three Years, >2x Per Annum AWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default
  23. 23. Availability Is it running yet? How many places is it running in? How far apart are those places?
  24. 24. Antifragile API Patterns Functional Reactive with Circuit Breakers and Bulkheads
  25. 25. Outages • Running very fast with scissors – Mostly self inflicted – bugs, mistakes – Some caused by AWS bugs and mistakes • Next step is multi-region – Investigating and building in stages during 2013 – Could have prevented some of our 2012 outages
  26. 26. Managing Multi-Region Availability AWS DynECT Route53 UltraDNS DNS Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas What we need is a portable way to manage multiple DNS providers….
  27. 27. Denominator Software Defined DNS for Java Edda, Multi- Use Cases Region Failover Common Model Denominator DNS Vendor Plug-in AWS Route53 DynECT UltraDNS Etc… API Models (varied IAM Key Auth User/pwd User/pwd and mostly broken) REST REST SOAP Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
  28. 28. A Cloud Native Open Source Platform
  29. 29. Inspiration
  30. 30. Three Questions Why is Netflix doing this? How does it all fit together? What is coming next?
  31. 31. Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
  32. 32. How did Netflix get ahead? Netflix Business + Developer Org Traditional IT Operations • Doing it right now • Taking their time • SaaS Applications • Pilot private cloud projects • PaaS for agility • Beta quality installations • Public IaaS for AWS features • Small scale • Big data in the cloud • Integrating several vendors • Integrating many APIs • Paying big $ for software • FOSS from github • Paying big $ for consulting • Renting hardware for 1hr • Buying hardware for 3yrs • Coding in Java/Groovy/Scala • Hacking at scripts
  33. 33. Netflix Platform Evolution 2009-2010 2011-2012 2013-2014 Bleeding Edge Common Shared Innovation Pattern Pattern Netflix ended up several years ahead of the industry, but it’s not a sustainable position
  34. 34. Making it easy to follow Exploring the wild west each time vs. laying down a shared route
  35. 35. Establish our Hire, Retain and solutions as Best Engage Top Practices / Standards Engineers Goals Build up Netflix Benefit from a Technology Brand shared ecosystem
  36. 36. How does it all fit together?
  37. 37. NetflixOSS Continuous Build and Deployment Github Maven AWS NetflixOSS Central Base AMI Source Cloudbees Dynaslave Jenkins AWS AWS Build Aminator Baked AMIs Slaves Bakery Odin Asgard AWS Orchestration (+ Frigga) Account API Console
  38. 38. NetflixOSS Services Scope AWS Account Asgard Console Archaius Config Multiple AWS Regions Service Cross region Priam C* Eureka Registry Explorers Dashboards Exhibitor ZK 3 AWS Zones Application Priam Evcache Atlas Edda History Clusters Cassandra Memcached Monitoring Autoscale Groups Persistent Storage Ephemeral Storage Instances Simian Army Genie Hadoop Services
  39. 39. NetflixOSS Instance Libraries • Baked AMI – Tomcat, Apache, your code Initialization • Governator – Guice based dependency injection • Archaius – dynamic configuration properties client • Eureka - service registration client Service • Karyon - Base Server for inbound requests • RxJava – Reactive pattern • Hystrix/Turbine – dependencies and real-time status Requests • Ribbon - REST Client for outbound calls • Astyanax – Cassandra client and pattern library Data Access • Evcache – Zone aware Memcached client • Curator – Zookeeper patterns • Denominator – DNS routing abstraction • Blitz4j – non-blocking logging Logging • Servo – metrics export for autoscaling • Atlas – high volume instrumentation
  40. 40. NetflixOSS Testing and Automation • CassJmeter – Load testing for Cassandra Test Tools • Circus Monkey – Test account reservation rebalancing • Janitor Monkey – Cleans up unused resources • Efficiency Monkey Maintenance • Doctor Monkey • Howler Monkey – Complains about expiring certs • Chaos Monkey – Kills Instances • Chaos Gorilla – Kills Availability Zones Availability • Chaos Kong – Kills Regions • Latency Monkey – Latency and error injection • Security Monkey Security • Conformity Monkey
  41. 41. Example Application – RSS Reader
  42. 42. What’s Coming Next? Better portability Higher availability More Features Easier to deploy Contributions from end users Contributions from vendors More Use Cases
  43. 43. Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Release 3.3 in 2Q13 Some vendor interest Some vendor interest Many missing features Needs AWS compatible Autoscaler Bait and switch AWS API strategy
  44. 44. AWS 2009 vs. ??? Eucalyptus 3.3
  45. 45. Netflix Cloud Prize Boosting the @NetflixOSS Ecosystem
  46. 46. In 2012 Netflix Engineering won this..
  47. 47. We’d like to give out prizes too But what for? Contributions to NetflixOSS! Shared under Apache license Located on github
  48. 48. How long do you have? Entries open March 13th Entries close September 15th Six months…
  49. 49. Who can win? Almost anyone, anywhere… Except current or former Netflix or AWS employees
  50. 50. Who decides who wins? Nominating Committee Panel of Judges
  51. 51. Judges Aino Corry Martin Fowler Program Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks Strategist Werner Vogels Yury Izrailevsky CTO Amazon Joe Weinman VP Cloud Netflix SVP Telx, Author “Cloudonomics”
  52. 52. What are Judges Looking For? Eligible, Apache 2.0 licensed Original and useful contribution to NetflixOSS Code that successfully builds and passes a test suite A large number of watchers, stars and forks on github NetflixOSS project pull requests Good code quality and structure Documentation on how to build and run it Evidence that code is in use by other projects, or is running in production
  53. 53. What do you win? One winner in each of the 10 categories Ticket and expenses to attend AWS Re:Invent 2013 in Las Vegas A Trophy
  54. 54. How do you enter? Get a (free) github account Fork Send us your email address Describe and build your entry Twitter #cloudprize
  55. 55. Award Apache Registration Close Entries AWS Ceremony Github Opens Today Github Licensed Github September 15 Dinner Contributions Re:Invent November Judges Winners $10K cash $5K AWS Netflix Nominations Categories Ten Prize Engineering Categories AWS Trophy Re:Invent Conforms to Working Community Tickets Entrants Rules Code Traction
  56. 56. Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering an ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
  57. 57. Takeaway Netflix is making it easy for everyone to adopt Cloud Native patterns. Open Source is not just the default, it’s a strategic weapon. @adrianco #netflixcloud @NetflixOSS

Notas del editor

  • When Netflix first moved to cloud it was bleeding edge innovation, we figured stuff out and made stuff up from first principles. Over the last two years more large companies have moved to cloud, and the principles, practices and patterns have become better understood and adopted. At this point there is intense interest in how Netflix runs in the cloud, and several forward looking organizations adopting our architectures and starting to use some of the code we have shared. Over the coming years, we want to make it easier for people to share the patterns we use.
  • The railroad made it possible for California to be developed quickly, by creating an easy to follow path we can create a much bigger ecosystem around the Netflix platform