NoSQL Definition http://nosql-database.org/
NoSQL DEFINITION: Next Generation Databases mostly
addressing some of the points: being non-relational,
distributed, open-source and horizontal scalable. The original
intention has been modern web-scale databases. The
movement began early 2009 and is growing rapidly. Often
more characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent /BASE (not ACID),
a huge data amount, and more. So the misleading
term "nosql" (the community now translates it mostly with
"not only sql") should be seen as an alias to something like
the definition above.
Who Uses NoSQL?
• Twitter uses DBFlock/MySQL and Cassandra
• Cassandra is an open source project from Facebook
• Digg, Reddit use Cassandra
• bit.ly, foursquare, sourceforge, and New York Times use
MongoDB
• Adobe, Alibaba, Ebay, use Hadoop
Why SQL sucks..
• O/R mapping (also known as Impedance Mismatch)
• Data-Model changes are hard and expensive
• SQL database are designed for high throughput, not
low latency
• SQL Databases do no scale out well
• Microsoft, Oracle, and IBM charge big bucks for
databases
– And then you need to hire a database admin
• Take it from the context of Google, Twitter, Facebook
and Amazon.
– Your databases are among the biggest in the world and
nobody pays you for that feature
– Wasting profit!!!
What has NoSQL done?
• Implemented the most common use cases
as a piece of software
• Designed for scalability and performance
Visual Guide To NoSQL
http://blog.nahurst.com/visual-guide-to-nosql-systems
NoSQL Data Model: Document
Oriented
• Data is stored as “documents”
• We are not talking about Word documents
• Comparable to Aggregates in DDD
• It means mostly schema free structured data
• Can be queried
• Is easily mapped to OO systems (Domain
Model, DDD)
• No join need to implement via programming
Network Communications
• REST/JSON
• TCP/BSON (ClientDriver)
BSON [bee · sahn], short for Binary JSON, is a binary-en-
coded serialization of JSON-like documents. Like JSON,
BSON supports the embedding of documents and arrays
within other documents and arrays. BSON also contains
extensions that allow representation of data types that
are not part of the JSON spec. For example, BSON has a
Date type and a BinData type.
Client Drivers (Apache License)
• MongoDB currently has client support for the following
programming languages:
• C
• C++
• Erlang
• Haskell
• Java
• Javascript
• .NET (C# F#, PowerShell, etc)
• Perl
• PHP
• Python
• Ruby
• Scala
Queries (Regular Expressions)
{field: /regular.*expression/i}
// get all cities that start with “atl”
and end on “a” (e.g. atlanta)
db.cities.count({city: /atl.*a/i});
Queries (2) : LINQ
https://github.com/craiggwilson/fluent-mongo
Equals
x => x.Age == 21 will translate to {"Age": 21}
Greater Than, $gt:
x => x.Age > 18 will translate to {"Age": {$gt: 18}}
Greater Than Or Equal, $gte:
x => x.Age >= 18 will translate to {"Age": {$gte: 18}}
Less Than, $lt:
x => x.Age < 18 will translate to {"Age": {$lt: 18}}
Less Than Or Equal, $lte:
x => x.Age <= 18 will translate to {"Age": {$lte: 18}}
Not Equal, $ne:
x => x.Age != 18 will translate to {"Age": {$ne: 18}}
Atomic Operations (Optimistic
Locking)
• Update if current:
• Fetch the object.
• Modify the object locally.
• Send an update request that says "update the object
to this new value if it still matches its old value".
Atomic Operations: Sample
> t=db.inventory
> s = t.findOne({sku:'abc'})
{"_id" : "49df4d3c9664d32c73ea865a" , "sku" : "abc" , "qty" : 1}
> t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
> db.$cmd.findOne({getlasterror:1})
{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it has worked
> t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
>db.$cmd.findOne({getlasterror:1})
{"err" : , "updatedExisting" : false , "n" : 0 , "ok" : 1} // did not work
Replica set (1)
• Automatic failover
• Automatic recovery of servers that were
offline
• Distribution over more than one
Datacenter
• Automatic nomination of a new Master
Server in case of a failure
• Up to 7 server in one replica set
Mongo Sharding
• Partitioning data across multiple physical servers to
provide application scale-out
• Can distribute databases, collections or objects in a
collection
• Choose how you partition data (shardkey)
• Balancing, migrations, management all automatic
• Range based
• Can convert from single master to sharded system with
0 downtime
• Often works in conjunction with object replication
(failover)
Map Reduce using LINQ
https://github.com/craiggwilson/fluent-mongo/wiki/Map-Reduce
• LINQ is by far an easier way to compose map-reduce functions.
// Compose a map reduce to get the sum everyone's ages.
var sum = collection.AsQueryable().Sum(x => x.Age);
// Compose a map reduce to get the age range of everyone
grouped by the first letter of their last name.
var ageRanges =
from p in collection.AsQueryable()
group p by p.LastName[0] into g
select new
{
FirstLetter = g.Key,
AverageAge = g.Average(x => x.Age),
MinAge = g.Min(x => x.Age),
MaxAge = g.Max(x => x.Age)
};
Store large Files: GridFS
• The database supports native storage of
binary data within BSON objects (limited in
size 4 – 16 MB).
• GridFS is a specification for storing large
files in MongoDB
• Comparable to Amazon S3 online storage
service when using it in combination with
replication and sharding
Performance
On MySql, SourceForge was reaching its limits of
performance at its current user load. Using some of
the easy scale-out options in MongoDB, they fully
replaced MySQL and found MongoDB could handle
the current user load easily. In fact, after some
testing, they found their site can now handle 100
times the number of users it currently supports.
It means you can charge a lot less per user of
your application and get the same revenue. Think
about it.
Use Cases: Well suited
• Archiving and event logging
• Document and Content Management Systems
• E-Commerce
• Gaming. High performance small read/writes,
geospatial indexes
• High volume problems
• Mobile. Specifically, the server-side
infrastructure of mobile systems
• Projects using iterative/agile development
methodologies
• Real-time stats/analytics
Use Cases: Less Well Suited
• Systems with a heavy emphasis on
complex transactions such as banking
systems and accounting (multi-object
transactions)
• Traditional Non-Realtime Data
Warehousing
• Problems requiring SQL
We are entering an age where data is live, hardware
cheap and we need a new programming paradigm to
access and process the data
The new theory is based on the idea that RAM is
the storage, Harddisk a backup, and you keep
ten’s, hundred’s, if not thousand’s of servers in a
LAN
In the end results in blazing fast access times and
incredible up times
i: case insensitive
m: multiline
x: extended mode
No upsert.. Soll das dokument erzeugt werden, wenn es nicht gefunden wurde.
Shards meist aus replica sets bestehend
Config servern, die die Metadaten des Clusters verwalten
Mongos-Prozessen, die als router dienen
use techday
db.things.insert( { _id: 1, tags: ['dog', 'cat'] } );
db.things.insert( { _id: 2, tags: ['cat'] } );
db.things.insert( { _id: 3, tags: ['mouse', 'dog', 'cat'] } );
db.things.insert( { _id: 4, tags: [] } );
// map function
m = function(){
this.tags.forEach(
function(z){
emit(z, {count: 1} );
}
);
};
// reduce function
r = function(key, values){
var total = 0;
for (var i = 0; i < values.length; i++)
total += values[i].count;
return {count: total};
};
res = db.things.mapReduce(m, r, {out: {inline: 1}});
res.find()
res.drop()
Wie amazon s3 für arme
Mit MySQL hatte Sourceforge mit dem aktuellen user load die limite für die geforderte Performance erreicht.
Dann haben sie MySQL mit MongoDB erstetzt und haben mit der scale out option den gleichen workload locker handlen können.
Nach einigen Tests haben sie dann sogar herausgefunden, das sie jetzt 100 Mal die Menge der Benutzer handeln können.
Das heisst, sie haben weniger kosten pro benutzer der applikation bei gleichem Umsatz!
Show import from SQL Server
Systeme mit hoher gewichtung von komplexen transactionen