2. NoSQL What does it mean? RDBMS legacy and rise of NoSQL NoSQL classification Pros and Cons Possible use cases Real-world examples What next?
3. Who am I? Hugo Rodger-Brown CTO @ Profero Connect twitter.com/hugorodgerbrown hugorodgerbrown.blogspot.com Things I like: SEO, Scale, Social media Things I don’t: Databases
4. What does it mean? Movement, not a specification Subjective term (like Web 2.0) Originally used in 1998 Reintroduced at Rackspace to refer to non-RDBMS NoSQL != No SQL NoSQL == Not Only SQL ?
6. RDBMS Legacy Efficient data storage Powerful querying capabilities (SQL) Support ACID Transactions Mature, well supported Ubiquitous Bottom-up design Storage is cheap O/R Impedance Complex to manage Always the bottleneck Who really needs transactions?
7. Rise of NoSQL Internet Google 2006 Bigtable whitepaper (Google) “a sparse, distributed multi-dimensional sorted map” 2007 Dynamo whitepaper (Amazon) 2008 Cassandra released (Facebook) “a BigTable data model running on an Amazon Dynamo-like infrastructure” 2009 Voldemort released (LinkedIn) “a big, distributed, persistent, fault-tolerant hash table”
8. Rise of NoSQL – Amazon “There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications.”
9. NoSQL Data Store Classifications Key-Value store Amazon SimpleDB, Amazon Dynamo (Amazon), Tokyo Cabinet, Voldemort (Gilt Groupe) Wide-column (sparse) store Hadoop (Yahoo, EBay), Cassandra (Facebook), Bigtable (Google!), Azure Table Storage (MSFT), Excel(!) Document database MongoDB, CouchDB (BBC), RavenDB Graph database Neo4J, InfoGrid Object database Db4o, Versant, Perst, Cache Data Grids Infinispan, GigaSpaces, Terracotta
10. Why NoSQL Good Flexible (schema-less) Very scalable Simple to use and operate Eventually consistent Cheap Suited to Web applications Bad Immature No common standards Poor transaction support Poor query support New mindset required
11. NoSQL Use Cases Good Examples Logging data Shopping carts Favourites Preferences Session data Mock data providers Temporary / working data Variable schema data Stick with RDBMS Transactions (orders etc.) LOB applications Anything involving $$$ Business-critical data Reporting
14. Real-world Examples “As I described in an earlier blog post, the new BBC homepage has been built on a whole new technical architecture. Since launching we’ve found an issue with the service we use to save users’ customisation settings. Although we ran a public beta for more than 2 months, this problem only became apparent when we moved the whole audience across to the new site, increasing the load on the platform 20 times. Despite thorough load testing before launch we were unable to accurately predict the type and combination of customisations that users would perform, and as a result we now need to re-architect the way we save your homepage customisation settings in a more efficient way.”
15. Summary NoSQL is not a replacement for RDBMS No two scenarios are the same Use best tool for the job Experiment
NoSQL doesn’t have a formal definitionIt’s subjective, and has come to mean different things to different peopleThe original use, byCarlo Strozzi, was used for a lightweight relational database that did not expose a SQL interfaceIt was reintroduced as a term used to describe open-source non-relational databases for a conference held in 2009Summary: we have the concept of a form of persistence / data storage mechanism that is fundamentally different from RDBMS
Normalisation is the key here.SQL is an ISO standard, and pretty much everyone can craft a SELECT statementSQL also contains complex semantics around JOIN, GROUP, ORDER BY statementsTransactions – the bank account example