This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
10. The Web is Data
• Username => String
• Birthday => Int/ Int/ Int
• Blog Post => Text
• Image => Binary-file/blob
Data needs to be stored
to be useful
12. Gowalla Database
• PostgreSQL
• Relational (RDBMS)
• Open Source
• Competitor to MySQL
• ACID compliant
• Running on a Dedicated Managed Server
13. Need for Speed
• Throughput:
• The number of operations per minute that
can be performed
• Pure Speed:
• How long an individual operation takes.
14. Potential Problems
• Hardware
• Slow Network
• Slow hard-drive
• Insufficient CPU
• Insufficient Ram
• Software
• too many Reads
• too many Writes
15. Scaling Up versus Out
• Scale Up:
• More CPU, Bigger HD, More Ram etc.
• Scale Out:
• More machines
• More machines
• More machines
• ...
16. Scale Up
• Bigger faster machine
• More Ram
• More CPU
• Bigger ethernet bus
• ...
• Moores Law
• Diminishing returns
17. Scale Out
• Forget Moores law...
• Add more nodes
• Master/ Slave Database
• Sharding
18. Master/Slave
Write
Master DB
Copy
Slave DB Slave DB Slave DB Slave DB
Read
19. Master & Slave +/-
• Pro
• Increased read speed
• Takes read load off of master
• Allows us to Join across all tables
• Con
• Doesn’t buy increased write throughput
• Single Point of Failure in Master Node
20. Sharding
Write
Users in Users in Users in Users in
USA Europe Asia Africa
Read
21. Sharding +/-
• Pro
• Increased Write & Read throughput
• No Single Point of failure
• Individual features can fail
• Con
• Cannot Join queries between shards
22. What is a Database?
• Relational Database Managment System
(RDBMS)
• Stores Data Using Schema
• A.C.I.D. compliant
• Atomic
• Consistent
• Isolated
• Durable
23. RDBMS
• Relational
• Matches data on common characteristics
in data
• Enables “Join” & “Union” queries
• Makes data modular
24. Relational +/-
• Pros
• Data is modular
• Highly flexible data layout
• Cons
• Getting desired data can be tricky
• Over modularization leads to many join
queries
• Trade off performance for search-ability
25. Schema Storage
• Blueprint for data storage
• Break data into tables/columns/rows
• Give data types to your data
• Integer
• String
• Text
• Boolean
• ...
26. Schema +/-
• Pros
• Regularize our data
• Helps keep data consistent
• Converts to programming “types” easily
• Cons
• Must seperatly manage schema
• Adding columns & indexes to existing
large tables can be painful & slow
27. ACID
• Properties that guarante a reliably
transaction are processed
database
• Atomic
• Consistent
• Isolated
• Durable
28. ACID
• Atomic
• Any database Transaction is all or nothing.
• If one part of the transaction fails it all fails
“An Incomplete Transaction Cannot Exist”
29. ACID
• Consistent
• Any transaction will take the another
from one consistent state to
database
“Only Consistent data is allowed to be
written”
30. ACID
• Isolated
• No transaction should be able to interfere
with another transaction
“the same field cannot be updated by two
sources at the exact same time”
}
a = 0
a += 1 a = ??
a += 2
31. ACID
• Durable
• Onceway
that
a transaction Is committed it will stay
“Save it once, read it forever”
32. What is a Database?
• RDBMS
• Relational
• Flexible
• Has a schema
• Most likely ACID compliant
• Typically fast under low load or when
optimized
33. What is SQL?
• Structured Query Language
• The language databases speak
• Based on relational algebra
• Insert
• Query
• Update
• Delete
“SELECT Company, Country FROM Customers
WHERE Country = 'USA' ”
34. Why people <3 SQL
• Relational algebra is powerful
• SQL is proven
• well understood
• well documented
35. Why people </3 SQL
• Relational algebra Is hard
• Different databases support different SQL
syntax
• Yet another programming language to learn
36. SQL != Database
• SQL is used to talk to a RDBMS (database)
• SQL is not a RDBMS
42. Key Value Example
redis = Redis.new
Key Value
redis.set(“foo”, “bar”)
Key
redis.get(“foo”)
Value
>> “bar”
43. Key Value
• Like a databse that can only ever use
primary Key (id)
YES
select * from users where id = ‘3’;
NO
select * from users where name = ‘schneems’;
44. NoSQL @ Gowalla
• Redis (key-value store)
• Store “Likes” & Analytics
• Memcache (key-value store)
• Cache Database results
• Cassandra
• (eventually consistent, with-schema, key
value store)
• Store “feeds” or “timelines”
• Solr (search index)
45. Memcache
• Key-Value Store
• Open Source
• Distributed
• In memory (ram) only
• fast, but volatile
• Not ACID
• Memory object caching system
47. Memcache
• Can store whole objects
memcache = Memcache.new
user = User.where(:username => “schneems”)
memcache.set(“user:3”, user)
user_from_cache = memcache.get(“user:3”)
user_from_cache == user
>> true
user_from_cache.username
>> “Schneems”
48. Memcache @ Gowalla
• Cache Common Queries
• Decreases Load on DB (postgres)
• Enables higher throughput from DB
• Faster response than DB
• Users see quicker page load time
49. What to Cache?
• Objects that change infrequently
• users
• spots (places)
• etc.
• Expensive(ish) sql queries
• Friend ids for users
• User ids for people visiting spots
• etc.
52. Memcache <3’s DB
• We use them Together
• If memcache doesn’t have a value
• Fetch from the database
• Set the key from database
• Hard
• Cache Invalidation : (
53. Redis
• Key Value Store
• Open Source
• Not Distributed (yet)
• Extremely Quick
• “Data structure server”
65. Tradeoffs
• Every Data store has them
• Know your data store
• Strengths
• Weaknesses
66. NoSQL vs. RDBMS
• No Magic Bullet
• Use Both!!!
• Model data in a datastore you understand
• Switch to when/if you need to
• Understand Your Options