2. About the Speaker
• Clarence J M Tauro – clarence@couchbase.com
– Senior Instructor, Couchbase
– ~11 Years Professional Teaching and Consulting Experience
– Worked at Pivotal – Instructor/Consultant for Spring/Spring
Security/Spring Web/Enterprise Integration with Spring/Spring
JMS/Spring Web/Spring Batch, Pivotal Hadoop/Cloud Foundry
– PhD in Computer Science from Christ University [thesis
accepted]
– Hard-core Dog lover
3. Disclaimer
• Disclaimer: The views expressed in this presentation
are our own and do not necessarily reflect the views of
Couchbase
4. Objectives
• Introduction to NoSQL
• Are ACID Properties always desirable?
• Basically available, Soft state, Eventually consistent
(BASE)
• The CAP Theorem
• Introducing Couchbase
• Couchbase Operations
5. Introduction
RDBMS - predominant technology
for storing structured data in
web and business applications
“one size fits all” - thinking
concerning data-stores has been
questioned
Apply NoSQL databases for the
persistence layer/Polyglot
Programming
7. Are ACID Properties always desirable?
• … But what about:
– Latency
– Partition Tolerance
– High Availability
– Scalability
8. the system is
available, but not
necessarily all items
in it at any given
point in time
after a certain time all
nodes are consistent, but
at any given time this
might not be the case
information (state) the user
put into the system that will go
away if the user doesn't
maintain it
BASE
9. NoSQL Common Traits
• Non-relational
• Schema-free/Schema-on-read
• Eventual consistency
• Open source
• Distributed
• “web-scale”
10. The CAP Theorem
• Consistency – can all
nodes see identical data,
at all times?
• Availability – can all
nodes be read from and
written to, at all times?
• Partition Tolerance – will
nodes function normally,
even when the cluster
breaks?
Consistency
Partition
Tolerance
Availability
CHOOSE ANY TWO
11. The CAP Theorem
• CP: Consistency and Partition Tolerance
- Immediately consistent data across a horizontally scaled
cluster, even with network problems
- Couchbase
• AP: Availability and Partition Tolerance
- Always services requests, across multiple data centers,
even with network problems, data eventually consistent
- Apache HBase or Cassandra, Couchbase (XDCR)
• CA: Consistency and Availability
- Always services requests with immediately consistent
data, in a vertically scaled system
- MySQL, Oracle, Microsoft SQL Server
12. What do you do with the Data?
Operational Use
•Real time intelligence
•Focus on data flows and
processes
•Extremely fast (in-memory)
reads
•Extremely fast (log append)
writes
•Improve the current
outcome
Analytical Use
•Batched workloads
•Vast data aggregations
•Retrospective analyses
•Focus on data pools
•Improve future outcomes
13. Hadoop vs. NoSQL
Operational VelocityAnalytical Volume
Real-time
operational database systems
improve current outcomes
Batch-oriented
analytical database systems
improve future outcomes
Hadoop NoSQL
15. Key-Value Stores
• The most common; not-necessarily the most popular
• Key and a simple value
- Speed
- Scale
- Simplicity
• Find simple values by key extremely fast
Clarenceuser::1234
Melisauser::1235
Michaeluser::1236
16. Document Stores
• Key and a structured value (document)
- Speed
- Scale
- Flexibility
• Read/write ever-changing data about people, places,
and things, at cloud-scale
user::1234 { name: 'Frank', age: 37, kids: ['Sue', 'Ann', 'Bob'] }
user::1235 { name: 'Carolyn', age: 56, kids: ['Tina'] }
user::1236 { name: 'Tessa', age: 24}
17. Wide Column Stores
• Key and nested set of tuples
- Write vast volumes of data, with eventually consistent
read access
user::1234
name: text Frank
age: number 37
kid: text
Sue
Ann
Bob
user::1235
name: text Carolyn
age: number 56
kid: text Tina
18. Graph Databases
• Linked list of keyed objects
- Relationships
• Monitor complex, dynamically networked connections
user::1
234
Frank
37
Sue
Ann
Bob
user::1
235
Carolyn
56
Tina
user::1
236
Tessa
24
19. Polyglot Programming
• Enterprise will have a variety of different data storage
technologies for different kinds of data
• We need to ask how we want to manipulate the data.
This will help us figure out which persistence
technologies are appropriate
- User Sessions: Couchbase (Memcached)/Redis
- Financial Data: RDBMS
- Shopping Cart: Riak/Couchbase (Memcached)
- Recommendation Systems: Neo4J
- Product Catalog: Couchbase/MongoDB
- Reporting: RDBMS/Couchbase Views
- Analytics: Couchbase/Cassandra
20. History of Couchbase
NorthScale developed a
key-value storage engine
Apache CouchDB database
project
Membase and CouchOne joined forces in February
2011 to create Couchbase, the first and only
provider of a comprehensive, end-to-end family of
NoSQL database products
21. What is Couchbase Server?
• Couchbase Server
• Is a “document” database solution
• Has key/value based orientation
• Is geared for JSON
• Has no tables and no fixed schema
• Runs on a networked cluster of nodes
• Is highly scalable
• Is lightning fast read/write
• Has caching and persistence layers
• Automatically fails-over
• Couchbase Server is best suited for fast-changing data
items of relatively small size
25. • Technology Stack for Data Manager:
Couchbase Client SDK (“Smart Client”)
Client Query API1
and Query Engine (Views)
Cache Layer: RAM Cache
Persistence Layer: Couchbase
Couchbase Server Architecture
26. • Technology Stack for Cluster Manager:
Node Level – multiple vBuckets
• Default 1024 vBuckets/number of nodes
Cluster Level – multiple nodes (with 1 .. * buckets)1
Datacenter Level – multiple clusters (optional XDCR)2
Erlang (cluster management and process supervision)3
Couchbase Server Architecture
27. Anatomy of a Couchbase Application
Couchbase Client Software
Cluster Map
NS Server
EP Engine
NS Server
EP Engine
NS Server
EP Engine
{Server List}
1. REST request 8091
2. HTTP response
5. Create, Read, Update and Delete Documents
Becomes
a Smart
Client
4. Connect CRUD
Data Port 11210
33. Other Features of Couchbase 4.0
• Multi-dimensional Scaling
• N1QL
• XDCR
34. Training
Get Started with Couchbase Server 4.0:
www.couchbase.com/beta
Get Trained on Couchbase: http://training.couchbase.com
CD220: Developing Couchbase NoSQL Applications
Oct 20 – Oct 23 2015
CS300: Couchbase NoSQL Server Administration
Nov 17 – Nov 20 2015
Enroll Today!
1. Most modern operating systems want a few gigabytes (Windows usually a bit more than Linux), and there may be other processes running on these nodes such as monitoring agents. There are also needs for IO caching both for views and for the general functioning of the system. We typically recommend about 60-80% of an system’s RAM to be allocated to Couchbase’s quota, leaving the rest for headroom and memory needs outside of Couchbase itself.
2. Cross Datacenter Replication (XDCR) is covered later in this course.
3. See https://blog.couchbase.com/tag/erlang
The Memcache Client also uses a server list, but as contrasted to the Couchbase Client, there are no REST calls, it is only working over port 11210, and is very fast. This is using a proprietary Memchached protocol.
1. A set request comes in from the application .
2. Couchbase Server responds back that they key is written
3. Couchbase Server then Replicates the data out to memory in the other nodes
4. At the same time it is put the data into a write que to be persisted to disk