The document discusses data service level agreements (SLAs) in public cloud environments. It explains that achieving availability, consistency, and scalability is challenging due to Brewer's CAP theorem. It reviews strategies for relational and NoSQL databases to handle these tradeoffs, including dropping consistency or availability depending on needs. Code examples demonstrate typical operations for Cassandra, MongoDB, and Neo4J NoSQL databases. The conclusion recommends choosing solutions based on requirements and migrating to NoSQL as needed to address scaling issues.
2. About Us ScaleBase is a new startup targeting the database-as-a-service market (DBaaS) We offer unlimited database scalability and availability using our Database Load Balancer We launch in September, 2010. Stay tuned at our site.
3. Agenda The requirements for data SLA in public cloud environments Achieving data SLA with NOSQL Achieving data SLA with relational databases
5. What We Need Availability Consistency Scalability
6. Brewer's (CAP) Theorem It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same time) Availability (node failures do not prevent survivors from continuing to operate) Partition Tolerance (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem
7. What It Means http://guyharrison.squarespace.com/blog/2010/6/13/consistency-models-in-non-relational-databases.html
8. Dealing With CAP Drop Partition Tolerance Run everything on one machine. This is, of course, not very scalable.
9. Dealing With CAP Drop Availability If a partition fail, everything waits until the data is consistent again. This can be very complex to handle over a large number of nodes.
10. Dealing With CAP Drop Consistency Welcome to the “Eventually Consistent” term. At the end – everything will work out just fine - And hi, sometimes this is a good enough solution When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID
11. Reading More On CAP This is an excellent read, and some of my samples are from this blog http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
13. Databases And CAP ACID – Consistency Availability – tons of solutions, most of them not cloud oriented Oracle RAC MySQL Proxy Etc. Replication based solutions can solve at least read availability and scalability (see Azure SQL)
14. Database Cloud Solutions Amazon RDS NaviSite Oracle RAC Not that popular Costs to cloud providers (complexity, not standard)
15. So Where Is The Problem? Partition Tolerance just doesn’t work Scaling problems (usually write but also read) BigData problems
16. Scaling Up Issues with scaling up when the dataset is just too big RDBMS were not designed to be distributed Began to look at multi-node database solutions Known as ‘scaling out’ or ‘horizontal scaling’ Different approaches include: Master-slave Sharding
17. Scaling RDBMS – Master/Slave Master-Slave All writes are written to the master. All reads performed against the replicated slave databases Critical reads may be incorrect as writes may not have been propagated down Large data sets can pose problems as master needs to duplicate data to slaves
18. Scaling RDBMS - Sharding Partition or sharding Scales well for both reads and writes Not transparent, application needs to be partition-aware Can no longer have relationships/joins across partitions Loss of referential integrity across shards
19. Other ways to scale RDBMS Multi-Master replication INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time This involves de-normalizing data In-memory databases
21. NoSQL A term used to designate databases which differ from classic relational databases in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term which would include classic relational databases as a subset. http://en.wikipedia.org/wiki/NoSQL
22. NoSQL Types Key/Value A big hash table Examples: Voldemort, Amazon Dynamo Big Table Big table, column families Examples: Hbase, Cassandra Document based Collections of collections Examples: CouchDB, MongoDB Graph databases Based on graph theory Examples: Neo4J Each solves a different problem
24. Pros/Cons Pros: Performance BigData Most solutions are open source Data is replicated to nodes and is therefore fault-tolerant (partitioning) Don't require a schema Can scale up and down Cons: Code change No framework support Not ACID Eco system (BI, Backup) There is always a database at the backend Some API is just too simple
25. Amazon S3 Code Sample AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format); Response response = conn.createBucket(bucketName, location, null); final String text = "this is a test"; response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);
26. Cassandra Code Sample CassandraClient cl = pool.getClient() ; KeySpaceks = cl.getKeySpace("Keyspace1") ; // insert value ColumnPathcp = new ColumnPath("Standard1" , null, "testInsertAndGetAndRemove".getBytes("utf-8")); for(int i = 0 ; i < 100 ; i++){ ks.insert("testInsertAndGetAndRemove_"+i, cp , ("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8")); } //get value for(inti = 0 ; i < 100 ; i++){ Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp); String value = new String(col.getValue(),"utf-8") ; } //remove value for(int i = 0 ; i < 100 ; i++){ ks.remove("testInsertAndGetAndRemove_"+i, cp); }
27. Cassandra Code Sample – Cont’ try{ ks.remove("testInsertAndGetAndRemove_not_exist", cp); }catch(Exception e){ fail("remove not exist row should not throw exceptions"); } //get already removed value for(int i = 0 ; i < 100 ; i++){ try{ Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp); fail("the value should already being deleted"); }catch(NotFoundException e){ }catch(Exception e){ fail("throw out other exception, should be NotFoundException." + e.toString() ); } } pool.releaseClient(cl) ; pool.close() ;
28. Cassandra Statistics Facebook Search MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms Rewritten with Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms
29. MongoDB Mongo m = new Mongo(); DB db = m.getDB( "mydb" ); Set<String> colls = db.getCollectionNames(); for (String s : colls) { System.out.println(s); }
30. MongoDB – Cont’ BasicDBObjectdoc = new BasicDBObject(); doc.put("name", "MongoDB"); doc.put("type", "database"); doc.put("count", 1); BasicDBObject info = new BasicDBObject(); info.put("x", 203); info.put("y", 102); doc.put("info", info); coll.insert(doc);
33. Data SLA There is no golden hammer Choose your tool wisely, based on what you need Usually Start with RDBMS (shortest TTM, which is what we really care about) When scale issues occur – start moving to NoSQL based on your needs You can get Data SLA in the cloud – just think before you code!!!