Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.
2. NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.
Thursday May 8th 2014, 3:00pm-3:50pm SB 139
Slides and Feedback at: http://joind.in/11012
2
3. NoSQL
• History
• Popular NoSQL Databases
• NoSQL Database Comparisons
• Terminology
• Consistency, Replication, Performance
• NoSQL Implementation CRUD Operations
3 Slides and Feedback at: http://joind.in/11012
4. NoSQL Introduction
• NoSQL is a commonly adopted misnomer
• Typically does not use ANSI SQL
– SQL = Structured Query Language
– Structure exists but is more Flexible
– Queries are performed
– Language is closer to Programming Languages
4
Slides and Feedback at: http://joind.in/11012
6. NoSQL History
• 1998 Carlo Strozzi Command Line Database
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)
6
8. NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, Newer databases solving new
and different problems
• Not Only SQL http://blog.sym-
link.com/2009/10/30/nosql_whats_in_a_name.html
8 Slides and Feedback at: http://joind.in/11012
9. NoSQL Origination
• Problems not solved by RDBMs
• Limitations of RDBMs, not SQL
9 Slides and Feedback at: http://joind.in/11012
14. NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
14
15. NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”
15 Slides and Feedback at: http://joind.in/11012
16. NoSQL Comparison
16
Take note of patterns:
Recent Release, Open Source, Utilized at High-Volume sites
Variety of Formats:
Key-Value, Wide-Column, Document, Graph
http://db-engines.com/en/ranking
18. Key-Value Stores
18
Key Value
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
Codebucketdrinkbucket
25. NoSQL Comparison
25
No ANSI SQL Standards, No Predefined Schemas, Replication,
Eventual Consistency, Rarely Foreign Keys, Data Types not required
Newer Concepts: Sharding, REST API, JSON, MapReduce
26. NoSQL Characteristics
26
No Predefined Schemas
• May insert data without creating a table
• Schema Versions (v1.5, v1.6, v1.7,…)
Rarely Foreign Keys
• No JOIN operations
• Relationships are not automatically maintained
Eventual Consistency
• Old copies being replaced by new records
• Inconsistent data until all replacements are complete
30. Map Reduce
30
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate
Often Implement by defining Map and Reduce classes or functions
1+2+3+6+7+8+9=?
Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008
http://www.sciencedirect.com/science/article/pii/S0167642307001281
2
6
8
1
7
3
9
16
20
36
31. JSON
31
Subset of JavaScript Object Notation
Similarities to XML method for representing data
Syntax
Name : Value pairs
“salary” : “125000”
Values are: number, string, Boolean, array, object, or NULL
Objects can store Objects, Arrays can store Arrays
Separate pairs by commas
“salary” : “125000”, “gender” : “male”
Curly braces denote objects
{ “salary” : “125000”, “gender” : “male” }
Square brackets denote arrays
“phone” : [”555-1212”, ”555-3344”]
“phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
33. REST API
33
CRUD (Create, Read, Update, Delete) operations through the web
HTTP Methods
GET (List/Read)
POST(Update)
PUT(Create)
DELETE(Delete)
EXAMPLE API http://www.blinksale.com/api/
List/Read Data via HTTP GET to
http://www.blinksale.com/invoices
http://www.blinksale.com/invoices/invoice_id/payments
http://www.blinksale.com/invoices/?start=2006&end=2008
Returns XML results
34. REST API
34
Update data via HTTP POST to
http://www.blinksale.com/invoices/invoice_id/payments
<?xml version="1.0" encoding="UTF-8"?>
<payment xmlns="http://www.blinksale.com/api">
<amount>1000.00</amount>
<date>2006-09-27</date>
</payment>
REST = REpresentational State Transfer
Twitter Example:
https://dev.twitter.com/docs/api/1.1 (GET and POST only)
35. Database SELECT Statements
35
Oracle
SELECT * FROM relationships
MongoDB
db.relationships.find()
Cassandra (CQL)
SELECT * FROM relationships
Slides and Feedback at: http://joind.in/11012
36. Database SELECT Statements
36
Redis – Key-Value Store
SMEMBERS relationships
Riak – Key-Value Store with REST API (+ proprietary drivers)
http://localhost:8091/riak/relationships/likes
Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Slides and Feedback at: http://joind.in/11012
37. JOINS without Foreign Keys
37
original_id = ObjectId()
db.employer.insert({
"_id": original_id,
"name": "Broadway Tech",
"url": "bc.example.net" })
db.people.insert({
"name": "Erin",
“employer_id": original_id,
"url": "bc.example.net/Erin" })
“Erin” works at “Broadway Tech”
One of the employees at “Broadway Tech” is “Erin”
http://docs.mongodb.org/manual/reference/database-references/#document-references
45. SQL CRUD
45
Create
INSERT INTO table (column1, column2) VALUES (9, 'string');
Read
SELECT column1, column2 FROM table;
Update
UPDATE table SET column2 = 'text' WHERE column1= 9
Delete
DELETE FROM table WHERE column2='text'
46. Key-Value Stores
46
Key Value
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
codebucketdrinkbucket
48. Redis CRUD
48
Lists: One-dimensional array with insert, append, pop, and
push
Redis.lpush(‘users:employees’, ‘user:jim’)
redis.mget(redis.lrange(‘users:employess’,0,5))
Sets: lists with no duplicate values (SADD = Set Add)
SADD users:employees jim
SADD users:employees krishna
SMEMBERS employees
Sorted Sets: are sets with an added sorting value
ZADD users:employees 125000 jim
ZADD users:employees 157000 Krishna
ZRANGEBYSCORE users:employees 100000 180000
49. Riak CRUD
49
Easy to install and configure test cluster
REST Queries
Create/PUT a “course:CIS2120” row
curl –v –X PUT http://localhost:8091/riak/course/CIS2120
-H “Content-Type: application/json”
-d ‘{“name”:”Database Coding”, “days”:”MWF”}’
Read/GET the value for “course:CIS2120”
curl –X GET http://localhost:8091/riak/course/CIS2120
curl http://localhost:8091/riak/course/CIS2120
Key Value
course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
50. Riak Links
50
Riak can link on value to key:value to another with a relationship
curl –v –X PUT http://localhost:8091/riak/student/sorensen
-H “Content-Type: application/json”
-H “Link: </riak/course/CIS2120>; riaktag=”enrolled””
-d ‘{“firstname”:”Conner”}’
This does not automatically create a link from “sorensen“ to
“CIS2120”
52. Neo4j – Graph Database
52
http://www.neo4j.org/learn/try
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
https://gist.github.com/peterneubauer/6019125
http://gist.neo4j.org/?6019125
53. Neo4j CRUD
53
Must try dragging nodes at: http://www.neo4j.org/learn/try
MATCH (user {name:“Bill"})-[:KNOWS]->(colleague)
WHERE colleague.employer=“LinkedIn”
RETURN user,colleague
ORDER BY colleague.name LIMIT 10
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
MATCH (n)-[r]->(m) RETURN n,r,m
Matches any relationship between “n” and “m”
http://www.neo4j.org/learn/cypher
55. Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
55
56. Hbase
56
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control
57. Hbase
57
Row key is a unifier for column families.
If row does insert values in a column family no disk space
is utilized within the column family.
Keys are identified by column_family:column_name
text:
revision:author
revision:comment
Write-Ahead Logging
(WAL)
similar to file system
journaling
58. Hbase CRUD
58
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki
Seven Databases in Seven Weeks, Redmond & Wilson 2012
62. Cassandra Characteristics
62
Scalable, High-availability Wide-columnar datastore
Peer-to-peer rather than master-slave clusters
Tunable consistency can read/write to a single node,
quorum of nodes or all nodes
Recommends static and dynamic column families
Static column families have contain pre-defined columns
Contact Info: phone, address, email, web
Dynamic families have variable numbers of similar columns
Students enrolled in a course