Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Netflix Open Source Tools andBenchmarks for CassandraJune 2013Adrian Cockcroft@adrianco #cassandra13 @NetflixOSShttp://www...
Cloud NativeGlobal ArchitectureNetflixOSS Components
Cloud Native
Where time to market wins bigMaking a land-grabDisrupting competitors (OODA)Anything delivered as web services
How Soon?Code features in days instead of monthsGet hardware in minutes instead of weeksIncident response in seconds inste...
Tipping the BalanceUtopia Dystopia
A new engineering challengeConstruct a highly agile and highlyavailable service from ephemeral andoften broken components
Inspiration
"Genius is one percent inspiration and ninety-nine percent perspiration."Thomas A. Edison
Perspiration…A Cloud Native Open Source PlatformSee netflix.github.com
Netflix Platform EvolutionBleeding EdgeInnovationCommonPatternSharedPattern2009-2010 2011-2012 2013-2014Netflix started ou...
Establish our solutionsas Best Practices /StandardsHire, Retain and EngageTop EngineersBuild up NetflixTechnology BrandBen...
Your perspiration…Boosting the @NetflixOSS EcosystemSee netflix.github.com
JudgesAino CorryProgram Chair for Qcon/GOTOMartin FowlerChief Scientist ThoughtworksSimon WardleyStrategistYury Izrailevsk...
What do you win?One winner in each of the 10 categoriesTicket and expenses to attend AWSRe:Invent 2013 in Las VegasA Trophy
EntrantsNetflixEngineeringSix Judges WinnersNominationsConforms toRulesWorkingCodeCommunityTractionCategoriesRegistrationO...
Netflix StreamingA Cloud Native Application based onan open source platform
Netflix Member Web Site Home PagePersonalization Driven – How Does It Work?
How Netflix Streaming WorksCustomer Device(PC, PS3, TV…)Web Site orDiscovery APIUser DataPersonalizationStreaming APIDRMQo...
Amazon Video 1.31%(18x Prime)(25x Prime)Nov2012StreamingBandwidthMarch2013MeanBandwidth+39% 6mo
Real Web Server Dependencies Flow(Netflix Home page business transaction as seen by AppDynamics)Start HerememcachedCassand...
Component Micro-ServicesTest With Chaos Monkey, Latency Monkey
Three Balanced Availability ZonesTest with Chaos GorillaCassandra and Evcache ReplicasZone ACassandra and Evcache Replicas...
Triple Replicated PersistenceCassandra maintenance affects individual replicasCassandra and Evcache ReplicasZone ACassandr...
Isolated RegionsCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CUS-East Load BalancersCassandra Re...
Failure Modes and EffectsFailure Mode Probability Current Mitigation PlanApplication Failure High Automatic degraded respo...
Highly Available StorageA highly scalable, available and durabledeployment pattern based on ApacheCassandra
Single Function Micro-Service PatternOne keyspace, replaces a single table or materialized viewSingle function CassandraCl...
Stateless Micro-Service ArchitectureLinux Base AMI (CentOS or Ubuntu)Optional Apachefrontend,memcached, non-java appsMonit...
Cassandra Instance ArchitectureLinux Base AMI (CentOS or Ubuntu)Tomcat and Priamon JDKHealthcheck, StatusMonitoringAppDyna...
Priam – Cassandra AutomationAvailable at http://github.com/netflix• Netflix Platform Tomcat Code• Zero touch auto-configur...
Priam for C* 1.2 Vnodes• Prototype work started by Jason Brown• Completed– Restructured Priam for Vnode management• ToDo– ...
Cloud Native Big DataSize the cluster to the dataSize the cluster to the questionsNever wait for space or answers
Netflix DataovenData WarehouseOver 2 PetabytesUrsulaAegisthusData PipelinesFrom cloudServices~100 BillionEvents/dayFrom C*...
ETL for Cassandra• Data is de-normalized over many clusters!• Too many to restore from backups for ETL• Solution – read ba...
Global ArchitectureLocal Client Traffic to CassandraSynchronous Replication Across ZonesAsynchronous Replication Across Re...
Astyanax Cassandra Client for JavaAvailable at http://github.com/netflix• Features– Abstraction of connection pool from RP...
Recipes• Distributed row lock (without needing zookeeper)• Multi-region row lock• Uniqueness constraint• Multi-row uniquen...
Astyanax Futures• Maintain backwards compatibility• Wrapper for C* 1.2 Netty driver• More CQL support• NetflixOSS Cloud Pr...
Astyanax - Cassandra Write Data FlowsSingle Region, Multiple Availability Zone, Token AwareTokenAwareClientsCassandra•Disk...
Data Flows for Multi-Region WritesToken Aware, Consistency Level = Local QuorumUSClientsCassandra• Disks• Zone ACassandra•...
Scalability from 48 to 288 nodes on AWShttp://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html17437...
Cassandra Disk vs. SSD BenchmarkSame Throughput, Lower Latency, Half Cost
2013 - Cross Region Use Cases• Geographic Isolation– US to Europe replication of subscriber data– Read intensive, low upda...
ValidationLoad2013 - Benchmarking Global CassandraWrite intensive test of cross region capacity16 x hi1.4xlarge SSD nodes ...
Copying 18TB from East to WestCassandra bootstrap 9.3 Gbit/s single threaded 48 nodes to 48 nodesThanks to boundary.com fo...
Inter Region Traffic TestVerified at desired capacity, no problems, 339 MB/s, 83ms latency
Ramp Up Load Until It Breaks!Unmodified tuning, dropping client data at 1.93GB/s inter region trafficSpare CPU, IOPS, Netw...
Managing Multi-Region AvailabilityCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CRegional Load Ba...
How does it all fit together?
Example Application – RSS Reader
GithubNetflixOSSSourceAWSBase AMIMavenCentralCloudbeesJenkinsAminatorBakeryDynaslaveAWS BuildSlavesAsgard(+ Frigga)Console...
AWS AccountAsgard ConsoleArchaiusConfig ServiceCross region Priam C*PytheasDashboardsAtlasMonitoringGenie, LipstickHadoop ...
•Baked AMI – Tomcat, Apache, your code•Governator – Guice based dependency injection•Archaius – dynamic configuration prop...
•CassJmeter – Load testing for Cassandra•Circus Monkey – Test account reservation rebalancing•Gcviz – Garbage collection v...
Dashboards with Pytheas (Explorers)http://techblog.netflix.com/2013/05/announcing-pytheas.html• Cassandra Explorer– Browse...
Cassandra Clusters
AWS Usage (coming soon)Reservation-aware cost monitoring and reporting
More Use CasesMoreFeaturesBetter portabilityHigher availabilityEasier to deployContributions from end usersContributions f...
Functionality and scale now, portability comingMoving from parts to a platform in 2013Netflix is fostering a cloud native ...
TakeawayNetflixOSS makes it easier for everyone to become Cloud Native@adrianco #cassandra13 @NetflixOSS
Slideshare NetflixOSS Details• Lightning Talks Feb S1E1– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adrian Cockcroft
Próxima SlideShare
Cargando en…5
×

C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adrian Cockcroft

Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!

  • Inicia sesión para ver los comentarios

C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adrian Cockcroft

  1. 1. Netflix Open Source Tools andBenchmarks for CassandraJune 2013Adrian Cockcroft@adrianco #cassandra13 @NetflixOSShttp://www.linkedin.com/in/adriancockcroft
  2. 2. Cloud NativeGlobal ArchitectureNetflixOSS Components
  3. 3. Cloud Native
  4. 4. Where time to market wins bigMaking a land-grabDisrupting competitors (OODA)Anything delivered as web services
  5. 5. How Soon?Code features in days instead of monthsGet hardware in minutes instead of weeksIncident response in seconds instead of hours
  6. 6. Tipping the BalanceUtopia Dystopia
  7. 7. A new engineering challengeConstruct a highly agile and highlyavailable service from ephemeral andoften broken components
  8. 8. Inspiration
  9. 9. "Genius is one percent inspiration and ninety-nine percent perspiration."Thomas A. Edison
  10. 10. Perspiration…A Cloud Native Open Source PlatformSee netflix.github.com
  11. 11. Netflix Platform EvolutionBleeding EdgeInnovationCommonPatternSharedPattern2009-2010 2011-2012 2013-2014Netflix started out several years ahead of theindustry, but it’s becoming commoditized now
  12. 12. Establish our solutionsas Best Practices /StandardsHire, Retain and EngageTop EngineersBuild up NetflixTechnology BrandBenefit from a sharedecosystemGoals
  13. 13. Your perspiration…Boosting the @NetflixOSS EcosystemSee netflix.github.com
  14. 14. JudgesAino CorryProgram Chair for Qcon/GOTOMartin FowlerChief Scientist ThoughtworksSimon WardleyStrategistYury IzrailevskyVP Cloud NetflixWerner VogelsCTO Amazon Joe WeinmanSVP Telx, Author “Cloudonomics”
  15. 15. What do you win?One winner in each of the 10 categoriesTicket and expenses to attend AWSRe:Invent 2013 in Las VegasA Trophy
  16. 16. EntrantsNetflixEngineeringSix Judges WinnersNominationsConforms toRulesWorkingCodeCommunityTractionCategoriesRegistrationOpenedMarch 13GithubApacheLicensedContributionsGithubClose EntriesSeptember 15GithubAwardCeremonyDinnerNovemberAWSRe:InventTen PrizeCategories$10K cash$5K AWSAWSRe:InventTicketsTrophy
  17. 17. Netflix StreamingA Cloud Native Application based onan open source platform
  18. 18. Netflix Member Web Site Home PagePersonalization Driven – How Does It Work?
  19. 19. How Netflix Streaming WorksCustomer Device(PC, PS3, TV…)Web Site orDiscovery APIUser DataPersonalizationStreaming APIDRMQoS LoggingOpenConnectCDN BoxesCDNManagement andSteeringContent EncodingConsumerElectronicsAWS CloudServicesCDN EdgeLocations
  20. 20. Amazon Video 1.31%(18x Prime)(25x Prime)Nov2012StreamingBandwidthMarch2013MeanBandwidth+39% 6mo
  21. 21. Real Web Server Dependencies Flow(Netflix Home page business transaction as seen by AppDynamics)Start HerememcachedCassandraWeb serviceS3 bucketPersonalization movie group choosers(for US, Canada and Latam)Each icon isthree to a fewhundredinstancesacross threeAWS zones
  22. 22. Component Micro-ServicesTest With Chaos Monkey, Latency Monkey
  23. 23. Three Balanced Availability ZonesTest with Chaos GorillaCassandra and Evcache ReplicasZone ACassandra and Evcache ReplicasZone BCassandra and Evcache ReplicasZone CLoad Balancers
  24. 24. Triple Replicated PersistenceCassandra maintenance affects individual replicasCassandra and Evcache ReplicasZone ACassandra and Evcache ReplicasZone BCassandra and Evcache ReplicasZone CLoad Balancers
  25. 25. Isolated RegionsCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CUS-East Load BalancersCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CEU-West Load Balancers
  26. 26. Failure Modes and EffectsFailure Mode Probability Current Mitigation PlanApplication Failure High Automatic degraded responseAWS Region Failure Low Switch traffic between regionsAWS Zone Failure Medium Continue to run on 2 out of 3 zonesDatacenter Failure Medium Migrate more functions to cloudData store failure Low Restore from S3 backupsS3 failure Low Restore from remote archiveUntil we got really good at mitigating high and mediumprobability failures, the ROI for mitigating regionalfailures didn’t make sense. Working on it now.
  27. 27. Highly Available StorageA highly scalable, available and durabledeployment pattern based on ApacheCassandra
  28. 28. Single Function Micro-Service PatternOne keyspace, replaces a single table or materialized viewSingle function CassandraCluster Managed by PriamBetween 6 and 144 nodesStateless Data Access REST ServiceAstyanax Cassandra ClientOptionalDatacenterUpdate FlowMany Different Single-Function REST ClientsAppdynamics Service Flow VisualizationEach icon represents a horizontally scaled service of three tohundreds of instances deployed over three availability zonesOver 50 Cassandra clustersOver 1000 nodesOver 30TB backupOver 1M writes/s/cluster
  29. 29. Stateless Micro-Service ArchitectureLinux Base AMI (CentOS or Ubuntu)Optional Apachefrontend,memcached, non-java appsMonitoringLog rotation to S3AppDynamicsmachineagentEpic/AtlasJava (JDK 6 or 7)AppDynamicsappagentmonitoringGC and threaddump loggingTomcatApplication war file, base servlet,platform, client interface jars,AstyanaxHealthcheck, status servlets, JMXinterface, Servo autoscale
  30. 30. Cassandra Instance ArchitectureLinux Base AMI (CentOS or Ubuntu)Tomcat and Priamon JDKHealthcheck, StatusMonitoringAppDynamicsmachineagentEpic/AtlasJava (JDK 7)AppDynamicsappagentmonitoringGC and threaddump loggingCassandra ServerLocal Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit logand SSTables
  31. 31. Priam – Cassandra AutomationAvailable at http://github.com/netflix• Netflix Platform Tomcat Code• Zero touch auto-configuration• State management for Cassandra JVM• Token allocation and assignment• Broken node auto-replacement• Full and incremental backup to S3• Restore sequencing from S3• Grow/Shrink Cassandra “ring”
  32. 32. Priam for C* 1.2 Vnodes• Prototype work started by Jason Brown• Completed– Restructured Priam for Vnode management• ToDo– Re-think SSTable backup/restore strategy
  33. 33. Cloud Native Big DataSize the cluster to the dataSize the cluster to the questionsNever wait for space or answers
  34. 34. Netflix DataovenData WarehouseOver 2 PetabytesUrsulaAegisthusData PipelinesFrom cloudServices~100 BillionEvents/dayFrom C*Terabytes ofDimensiondataHadoop Clusters – AWS EMR1300 nodes 800 nodes Multiple 150 nodes NightlyRDSMetadataGatewaysTools
  35. 35. ETL for Cassandra• Data is de-normalized over many clusters!• Too many to restore from backups for ETL• Solution – read backup files using Hadoop• Aegisthus– http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html– High throughput raw SSTable processing– Re-normalizes many clusters to a consistent view– Extract, Transform, then Load into Teradata
  36. 36. Global ArchitectureLocal Client Traffic to CassandraSynchronous Replication Across ZonesAsynchronous Replication Across Regions
  37. 37. Astyanax Cassandra Client for JavaAvailable at http://github.com/netflix• Features– Abstraction of connection pool from RPC protocol– Fluent Style API– Operation retry with backoff– Token aware– Batch manager– Many useful recipes– New: Entity Mapper based on JPA annotations
  38. 38. Recipes• Distributed row lock (without needing zookeeper)• Multi-region row lock• Uniqueness constraint• Multi-row uniqueness constraint• Chunked and multi-threaded large file storage• Reverse index search• All rows query• Durable message queue• Contributed: High cardinality reverse index
  39. 39. Astyanax Futures• Maintain backwards compatibility• Wrapper for C* 1.2 Netty driver• More CQL support• NetflixOSS Cloud Prize Ideas– DynamoDB Backend?– More recipes?
  40. 40. Astyanax - Cassandra Write Data FlowsSingle Region, Multiple Availability Zone, Token AwareTokenAwareClientsCassandra•Disks•Zone ACassandra•Disks•Zone BCassandra•Disks•Zone CCassandra•Disks•Zone ACassandra•Disks•Zone BCassandra•Disks•Zone C1. Client Writes to localcoordinator2. Coodinator writes toother zones3. Nodes return ack4. Data written tointernal commit logdisks (no more than10 seconds later)If a node goes offline,hinted handoffcompletes the writewhen the node comesback up.Requests can choose towait for one node, aquorum, or all nodes toack the writeSSTable disk writes andcompactions occurasynchronously144423332
  41. 41. Data Flows for Multi-Region WritesToken Aware, Consistency Level = Local QuorumUSClientsCassandra• Disks• Zone ACassandra• Disks• Zone BCassandra• Disks• Zone CCassandra• Disks• Zone ACassandra• Disks• Zone BCassandra• Disks• Zone C1. Client writes to local replicas2. Local write acks returned toClient which continues when2 of 3 local nodes arecommitted3. Local coordinator writes toremote coordinator.4. When data arrives, remotecoordinator node acks andcopies to other remote zones5. Remote nodes ack to localcoordinator6. Data flushed to internalcommit log disks (no morethan 10 seconds later)If a node or region goes offline, hinted handoffcompletes the write when the node comes back up.Nightly global compare and repair jobs ensureeverything stays consistent.EUClientsCassandra• Disks• Zone ACassandra• Disks• Zone BCassandra• Disks• Zone CCassandra• Disks• Zone ACassandra• Disks• Zone BCassandra• Disks• Zone C6556 644416662223100+ms latency
  42. 42. Scalability from 48 to 288 nodes on AWShttp://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html17437336682853717210998370200000400000600000800000100000012000000 50 100 150 200 250 300 350Client Writes/s by node count – Replication Factor = 3Used 288 of m1.xlarge4 CPU, 15 GB RAM, 8 ECUCassandra 0.86Benchmark config onlyexisted for about 1hr
  43. 43. Cassandra Disk vs. SSD BenchmarkSame Throughput, Lower Latency, Half Cost
  44. 44. 2013 - Cross Region Use Cases• Geographic Isolation– US to Europe replication of subscriber data– Read intensive, low update rate– Production use since late 2011• Redundancy for regional failover– US East to US West replication of everything– Includes write intensive data, high update rate– Testing now
  45. 45. ValidationLoad2013 - Benchmarking Global CassandraWrite intensive test of cross region capacity16 x hi1.4xlarge SSD nodes per zone = 96 totalCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CUS-West-2 Region - OregonCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CUS-East-1 Region - VirginiaTestLoadTestLoadInter-Zone Traffic1 Million writesCL.ONE1 Million readsCL.ONE with noData lossInter-Region TrafficS3
  46. 46. Copying 18TB from East to WestCassandra bootstrap 9.3 Gbit/s single threaded 48 nodes to 48 nodesThanks to boundary.com for these network analysis plots
  47. 47. Inter Region Traffic TestVerified at desired capacity, no problems, 339 MB/s, 83ms latency
  48. 48. Ramp Up Load Until It Breaks!Unmodified tuning, dropping client data at 1.93GB/s inter region trafficSpare CPU, IOPS, Network, just need some Cassandra tuning for more
  49. 49. Managing Multi-Region AvailabilityCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CRegional Load BalancersCassandra ReplicasZone ACassandra ReplicasZone BCassandra ReplicasZone CRegional Load BalancersUltraDNSDynECTDNSAWSRoute53Denominator – manage traffic via multiple DNS providersDenominator
  50. 50. How does it all fit together?
  51. 51. Example Application – RSS Reader
  52. 52. GithubNetflixOSSSourceAWSBase AMIMavenCentralCloudbeesJenkinsAminatorBakeryDynaslaveAWS BuildSlavesAsgard(+ Frigga)ConsoleAWSBaked AMIsOdinOrchestrationAPIAWSAccountContinuous Build and Deployment
  53. 53. AWS AccountAsgard ConsoleArchaiusConfig ServiceCross region Priam C*PytheasDashboardsAtlasMonitoringGenie, LipstickHadoop ServicesAWS UsageCost MonitoringMultiple AWS RegionsEureka RegistryExhibitor ZKEdda HistorySimian ArmyZuul Traffic Mgr3 AWS ZonesApplication ClustersAutoscale GroupsInstancesPriamCassandraPersistent StorageEvcacheMemcachedEphemeral StorageNetflixOSS Services Scope
  54. 54. •Baked AMI – Tomcat, Apache, your code•Governator – Guice based dependency injection•Archaius – dynamic configuration properties client•Eureka - service registration clientInitialization•Karyon - Base Server for inbound requests•RxJava – Reactive pattern•Hystrix/Turbine – dependencies and real-time status•Ribbon - REST Client for outbound callsServiceRequests•Astyanax – Cassandra client and pattern library•Evcache – Zone aware Memcached client•Curator – Zookeeper patterns•Denominator – DNS routing abstractionData Access•Blitz4j – non-blocking logging•Servo – metrics export for autoscaling•Atlas – high volume instrumentationLoggingNetflixOSS Instance Libraries
  55. 55. •CassJmeter – Load testing for Cassandra•Circus Monkey – Test account reservation rebalancing•Gcviz – Garbage collection visualizationTest Tools•Janitor Monkey – Cleans up unused resources•Efficiency Monkey•Doctor Monkey•Howler Monkey – Complains about AWS limitsMaintenance•Chaos Monkey – Kills Instances•Chaos Gorilla – Kills Availability Zones•Chaos Kong – Kills Regions•Latency Monkey – Latency and error injectionAvailability•Security Monkey – security group and S3 bucket permissions•Conformity Monkey – architectural pattern warningsSecurityNetflixOSS Testing and Automation
  56. 56. Dashboards with Pytheas (Explorers)http://techblog.netflix.com/2013/05/announcing-pytheas.html• Cassandra Explorer– Browse clusters, keyspaces, column families• Base Server Explorer– Browse service endpoints configuration, perf• Anything else you want to build
  57. 57. Cassandra Clusters
  58. 58. AWS Usage (coming soon)Reservation-aware cost monitoring and reporting
  59. 59. More Use CasesMoreFeaturesBetter portabilityHigher availabilityEasier to deployContributions from end usersContributions from vendorsWhat’s Coming Next?
  60. 60. Functionality and scale now, portability comingMoving from parts to a platform in 2013Netflix is fostering a cloud native ecosystemRapid Evolution - Low MTBIAMSH(Mean Time Between Idea And Making Stuff Happen)
  61. 61. TakeawayNetflixOSS makes it easier for everyone to become Cloud Native@adrianco #cassandra13 @NetflixOSS
  62. 62. Slideshare NetflixOSS Details• Lightning Talks Feb S1E1– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks• Asgard In Depth Feb S1E1– http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house• Lightning Talks March S1E2– http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-roadmap• Security Architecture– http://www.slideshare.net/jason_chan/• Cost Aware Cloud Architectures – with Jinesh Varia of AWS– http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-varia-aws-and-adrian-cockroft-netflix

×