Recently myself and my colleague - Saneesh - did a presentation to some of our peers and colleagues on the upcoming trend of this new distributed storage for web applications. Thought of sharing it with the rest of you
Axa Assurance Maroc - Insurer Innovation Award 2024
Database for cloud
1. What is NoSQL ?
Distributed storage. Varieties of ways to access / store data.
Varieties of open source offerings
Mostly clones based on either Amazon Dynamo, Google
BigTable, DHT
Designed to deliver massive scalability
Facebook, Twitter, Digg, Yahoo, LinkedIn, Google, Bing, Baidu,
Rackspace, Nokia depend on NoSQL
Not for every one !
Lacks transaction, atomic consistency
Yes, we will still need RDBMS around for some more time
2. How did we get here ?
How do I scale my application to handle millions of customers ?
Do I really need data normalization or transaction support for my
application ?
Simplicity to manage within the cloud
Do I need immediate or eventual realization ?
Prefer schema less for my application
Want to simply store documents, JSON object, better ORM
Need for high availability and performance than complex data
warehousing/mining support
Now that I have my apps on cloud, how do I simplify my db
administration ?
Dynamic Node management
Challenges in normalizing data with sharding , optimistic locking
approaches..
3. So, what are our options ?
Document Store
Apache CouchDB, MongoDB,
TerraStore
Schema less
Key Value
Amazon SimpleDB , Project
Voldemort
Column Store
Cassandra based on Amazon
Dynamo
Facebook, Twitter, Digg
HyperTable based on Google
Bigtable
Baidu
Hbase based on Hadoop
Bing
Blended
Drizzle
Maintains relational data
4. Decisions : SQL or NoSQL ?
Are you or your app SQL challenged ?
With Schema or Schema-less
Decide based on your data within your app
Do you need ACID ?
2 phase commit ?
Do you need data consistency ?
Atomic
Eventual
Do you need Data Persistency ?
Pluggable: File System, Database, Custom
In-Memory
Do you need JOINS, complex SQL queries ?
Do you have googlers, facebookers on your payroll ?
5. Challenges for cloud databases
Transactional support and Referential Integrity need to
be built in yet
Complex data access – They excel in single row
transactions, but not when a join is required
Business Intelligence – valuable business data is locked
inside of impenetrable application data stores
Data Integrity - does not offer either ACID support or
data reliability