5. Strong Focus on
Replication
Monday, August 16, 2010
- Built from day one to support bi-directional peer to peer replication
- This feature sets CouchDB apart from the other NoSQL databases, and makes it stand out in the database community
6. RESTful API
# Create
POST http://localhost:5984/employees
# Read
GET http://localhost:5984/employees/1
# Update
PUT http://localhost:5984/employees/1
# Delete
DELETE http://localhost:5984/employees/1
Monday, August 16, 2010
7. Queried and Indexed
with MapReduce
function(doc) {
if (doc.first_name == “John”)
emit(doc._id, 1);
}
function(keys, values, rereduce) {
return sum(values);
}
Monday, August 16, 2010
8. Multiversion
Concurrency Control
http://en.wikipedia.org/wiki/Multiversion_concurrency_control
Image: http://blogs.wyomingnews.com/blogs/everyonegives/files/2009/02/book-stack.jpg
Monday, August 16, 2010
- Documents never updated in place; new revisions are always created
- Advantages
* Don't have to manage locks for reads. Don't have to worry about a concurrent update corrupting a read that is in progress.
* Data is “safer”. Old revisions are kept around (at least for a while). If a botched update accidentally destroys data, you can always restore it from a previous revision.
* The database can perform some optimizations when writing to disk. If creating or updating 1000 documents, those documents will all live next to each other on disk, eliminating disk seeks.
- Disadvantages
* Requires occasional database compaction
9. Ultra Durable
Monday, August 16, 2010
- When CouchDB documents are updated, all data and associated indexes are flushed to disk and the transactional commit always leaves the database in a completely consistent state.
- 2 step commit:
* All document data and associated index updates are synchronously flushed to disk.
* The database header is written in two consecutive, identical chunks, and flushed to disk.
- Crash recovery:
* If crash on step 1 of commit, partially flushed data are forgotten upon restart.
* If crash on step 2, a surviving copy of the previous headers will remain, and are used.
- Crash only shutdown
10. Erlang OTP
Monday, August 16, 2010
The Erlang programming language and the OTP platform are known for their concurrency support, and OTP is known for its extreme emphasis on reliability and
availability.
12. Documents and
Document Storage
Monday, August 16, 2010
13. Documents
• JSON data format
• Schema-less
• Support for binary data in the form of
document “attachments”
• Each document uniquely named in the
database
Monday, August 16, 2010
14. Document Storage
• CouchDB uses append-only updates, and
never overwrites comitted data
• Document updates are serialized
• Update model is lockless and optimistic
• Reads are never blocked or interrupted by
a concurrent updates
• Databases require occasional compaction
Monday, August 16, 2010
16. Views
• Add structure back to your unstructured data, so
it can be queried
• Allow you to have many different view
representations of the same data
• Created by executing map and reduce functions
on your documents
• View definitions are stored in Design Documents
• Built incrementally and on demand
Monday, August 16, 2010
17. MapReduce
• MapReduce functions are written primarily
in Javascript (some other languages are
supported)
• The map function selects which documents
to operate on, emitting zero to many key/
value pairs to the reduce function
• The (optional) reduce function combines
the key/value pairs and performs any
necessary calculations on that data
Monday, August 16, 2010
18. View Indexes
• View indexes are stored on disk separate
from the main database, in a data structure
specific to the given Design Document
• Views are updated incrementally; only new/
changed documents are processed when
the view is accessed
• Building views (especially from scratch) can
be time consuming and resource intensive
for large databases
Monday, August 16, 2010
20. Replication
• Effecient and reliable bi-directional replication
• Only documents created/updated since the last
replication are replicated
• For each document, only updated fields are
replicated
• Support for one-time, continuous, and filtered
replication
• Fault tolerant - will simply pick up where it left off
if something bad happens
Monday, August 16, 2010
21. Conflict Management
• Documents with conflicts have a property named
“_conflicts”, which contains all conflicting revision ids
• CouchDB chooses a winning document, but keeps
losing documents around for manual conflict resolution
• CouchDB does not attempt to merge conflicting
documents
• It is the application’s responsibility to make sure data is
merged successfully
• Losing documents will be removed upon compaction
Monday, August 16, 2010
23. What are CouchApps?
• CouchApps are HTML and Javascript applications that
can be hosted directly from CouchDB
• CouchDB can serve HTML, images, CSS, Javascript, etc
• Applications live in a Design Document, with static
files (html, css, etc) as attachments
• Dynamic behavior and database access done via
Javascript
• CouchDB can be a complete, local web platform
• Support for virtual hosts and URL re-writing
Monday, August 16, 2010
24. Why?
• Your application and its associated data can be
distributed, and replicated, together
• If you like to share, somebody can grab your
application and data with a single replication
command
• Not only Open Source, but Open Data as well
• Applications can be taken off line, used, and
updated data can be synchronized at a later point
in time
Monday, August 16, 2010
30. Add-Ons
• couchdb-lucene - Enables full text searching
of documents using Lucene
• GeoCouch - Adds support for geospacial
queries to CouchDB
• Lounge - A proxy-based partitioning/
clustering framework for CouchDB
Monday, August 16, 2010