Speaker: Josep Blanquer - Chief Architect, RightScale
Is your database holding back your application? Find out how RightScale uses NoSQL databases such as Cassandra to provide a scalable, distributed, and highly available service around the world that is designed to recover from failures of a whole cloud region.
3. #3#3
#RightscaleCompute
Intro: Expectations and scope
What this is and what is not
• IS a talk about:
• how RightScale has designed and implemented its backing datastores
• …for a few of the most representative internal systems
• …with the rationale behind it
• Is NOT a talk about
• RightScale’s overall architecture
• Nodes or hosts, it’s about Systems
• RightScale’s data modeling
4. #4#4
#RightscaleCompute
Intro: Tools and Technologies
• RightScale uses a mix of RDBMS and NoSQL technologies:
• MySQL , Cassandra and S3 (for backups and archiving)
• Transactionality:
• MySQL: strong ACID properties
• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable
• Availability:
• MySQL: async replication. Master-SlaveN or Master-Master
• Cassandra: Distributed, master-less, highly-replicated (multi-DC)
• Queryability:
• MySQL: Extremely flexible at adding indexes and changing data model
• Cassandra: More difficult to change the querying patterns
5. #5#5
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
6. #6#6
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Common across accounts:
Users
Plans
Settings
MultiCloud Marketplace:
Published Assets
Sharing Groups
…
7. #7#7
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Deployments
Imported assets
Alert Specifications
Server Inputs
Audit
Tags
User Events
…
8. #8#8
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Cloud resource states (cache)
Cloud credentials
9. #9#9
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Instance agents location
Core agents location
Agent action registry
…
10. #10#10
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems
with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Collected metric data
Collected syslog data
…
11. #11#11
#RightscaleCompute
Taxonomy of RightScale’s Data
UsersInstances
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data?
• Users through the Dash/API
• Instances from the Cloud
Data close to the Users
Data close to the Cloud
Data Placement
12. #12#12
#RightscaleCompute
Taxonomy of RightScale’s DataX-acctAccount
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Which data do we need?
• Data for all accounts
• Data for a single account
Data shared between accounts
Data required within scope
of a single account
Data scope and containment
13. #13#13
Talk with the Experts.
Users
Taxonomy of RightScale’s Data
Instances
X-acctAccount
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data? Proximity to User vs. Cloud
Which data do we need? Scope of data available
Close to cloud resources
Account-shardable* data
Close to user
Account-shardable data
Close to user
Globally accessible data
22. #22#22
#RightscaleCompute
Account
Instances
Features:
• 1 instance: 1 home island
• 1 Island can serve N clouds
• Core Agents: global data
Benefits:
• Close to cloud resources
• Good failure isolation
• As good as cloud
• Good scale: global replicas
across cassandra DCs
routing
polling
monitor
Island1
Island2
IslandN
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Island1
Island2
IslandN
Polling Clouds: MySQL
• Master-Slave replication
• Can port to NoSQL easily
• Mostly a resource cache
• But cloud partitionable
Monitoring: Custom
• Replicated files
• Backup to S3
• Archive to S3
Routing: Cassandra
• Simpler Key-Value access
• Very high availability
• Great scalability
• Great replica control
• Plus cross DC replication*
25. #25#25
#RightscaleCompute
Conclusions
• Shown that RightScale uses multiple database technologies:
• RDBMS – MySQL for the ACID semantics and ‘queryability’
• Using a Master to N-Slaves for RO scale, and quick failure recovery
• And ReadOnly Provisioning – To increase RO availability and scale remote systems
• NoSQL: Cassandra for Availability and Scalability
• for higher Read/Write availability within a cluster
• For fully replicated regions across the globe (for Read/Write!)
• Shown how RightScale uses them in different techniques
• It partitions resource data into Islands based on cloud proximity
• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances
• Can provide routing availability, colocated with instances for any world region
• It partitions core data into Clusters based on account groups
• To scale the core horizontally, and independently and achieve account isolation/differentiation
• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources
• It maintains cluster pairs (sister sites)
• To recover from full cloud region failures
• It doesn’t require massive amounts of new resources to recover