MongoDB Table of Contents Guide

Table of Contents

1. Structure:............................................................................................................................................................. 4
1. Markus ............................................................................................................................................................... 4
2. Flavio................................................................................................................................................................. 5
2. Who are we? ........................................................................................................................................................ 6
1. Markus Gattol ..................................................................................................................................................... 7
2. Flavio Percoco Premoli .......................................................................................................................................... 8
3. Introduction Part 1 .............................................................................................................................................. 9
1. What I am going to tell you................................................................................................................................... 9
4. Integration with other Technologies .................................................................................................................. 10
5. Frequently Asked Questions ............................................................................................................................... 11
1. Basics .............................................................................................................................................................. 12
1. Are there any Reasons not to use MongoDB? ...................................................................................................... 13
2. What are the supported Programming Languages? .............................................................................................. 14
3. What is the Status of Python 3 Support? ............................................................................................................ 15
4. What is the difference in the main Building-blocks to RDBMSs? ............................................................................. 16
2. Administration................................................................................................................................................... 17
1. Is there a Web GUI? What about a REST Interface/API? ....................................................................................... 18
2. Can I rename a Database? ............................................................................................................................... 19
3. How do I physically migrate a Database? ........................................................................................................... 20
1. Secure Copy .... as in scp .............................................................................................................................. 20
2. Minimum Downtime...................................................................................................................................... 20
4. How do I update to a new MongoDB version?...................................................................................................... 22
5. What is the default listening Port and IP? ........................................................................................................... 23
6. Is there a Way to do automatic Backups? ........................................................................................................... 24
7. What is getSisterDB() good for? ........................................................................................................................ 25
8. How can I make MongoDB automatically start/restart on Server boot/reboot? ......................................................... 26
3. Resource Usage................................................................................................................................................. 27
1. Why is my Database growing so fast? ................................................................................................................ 28
2. What Caching Algorithm does MongoDB use? ...................................................................................................... 29
3. Why does MongoDB use so much RAM? ............................................................................................................. 30

4. What is the so-called Working Set Size? ............................................................................................................. 31
5. How much RAM does MongoDB need?................................................................................................................ 32
1. Speed Impact of not having enough RAM ........................................................................................................ 32
6. Can I limit MongoDB's RAM Usage? ................................................................................................................... 33
7. What can I do about Out Of Memory Errors? ....................................................................................................... 34
1. OpenVZ ...................................................................................................................................................... 35
8. Does MongoDB use more than one CPU Core?..................................................................................................... 36
9. How can I tell how many clients are connected? .................................................................................................. 37
10. How many parallel Client Connections to MongoDB can there be? .......................................................................... 38
11. Does MongoDB do Connection Pooling? .............................................................................................................. 39
12. Is there a Size limit of how much Data can be stored inside MongoDB? .................................................................. 40
13. Do embedded Documents count toward the 4 MiB BSON Document Size Limit? ....................................................... 41
14. Does Document Size impact read/write Performance? .......................................................................................... 42
15. Is there a Way to tell the Size of a specific Document? ......................................................................................... 43
16. How can I tell the Size of a Collection and its Indexes? ........................................................................................ 44
4. Collections / Namespaces ................................................................................................................................... 46
1. What is a Capped Collection? Why use it? ........................................................................................................... 47
2. Can I rename a Collection?............................................................................................................................... 48
3. What is a Virtual Collection? Why use it? ............................................................................................................ 49
4. Can I use a larger Number of Collections/Namespaces?........................................................................................ 50
5. How about cloning a Collection? ........................................................................................................................ 51
6. Can I merge two or more Collections into one? ................................................................................................... 52
7. How can I get a list of Collections in my Database?.............................................................................................. 53
8. How do I delete a Collection?............................................................................................................................ 55
9. What is a Namespace with regards to MongoDB?................................................................................................. 56
10. How can I get a list of Namespaces in Database? ................................................................................................ 57
5. Statistics / Monitoring ........................................................................................................................................ 58
1. The Server Status, what does it tell? ................................................................................................................. 59
6. Schema / Configuration ...................................................................................................................................... 62
7. Indexes / Search / Metadata ............................................................................................................................... 63
8. Map / Reduce .................................................................................................................................................... 64
9. GridFS / Data Size ............................................................................................................................................. 65

1. What is GridFS? .............................................................................................................................................. 66
1. What can we do with GridFS .......................................................................................................................... 66
2. Why use GridFS over ordinary Filesystem Storage?.............................................................................................. 67
10. Scalability / Fault Tolerance / Load Balancing ........................................................................................................ 68
11. Miscellaneous .................................................................................................................................................... 69
6. Use Case ............................................................................................................................................................ 70
7. Summary Part 1 ................................................................................................................................................. 71
8. Introduction Part 2 ............................................................................................................................................ 72
9. Existing Technologies......................................................................................................................................... 73
10. SQL to MongoDB Query Translation.................................................................................................................... 74
11. Keeping things lazy... ......................................................................................................................................... 75
12. Keeping Relations or Embedding? ...................................................................................................................... 76
1. Using References:.............................................................................................................................................. 77
2. Without references: ........................................................................................................................................... 78
3. Light and fast (For registered users): ................................................................................................................... 79
4. Heavy and slow (For any user): ........................................................................................................................... 79
5. Lazy relations or mongodb like ones:.................................................................................................................... 80
13. Taking Advantage from schema-less Databases for Web Development ..............................................................81
14. Summary Part 2 ................................................................................................................................................. 83

Structure:

Markus

• 2min: tell the audience what I am going to tell them (a summary) and why I think it's worth mentioning
• 3min: I'll start with a big picture view (how MongoDB just integrates nicely with existing setups eg folks can continue on using dm-
crypt/luks) basic principles like
• 5min: pick a few FAQs items and elaborate on them eg "Why is MongoDB using so much RAM"

• 5min: I will then go on taking a use case as an example (a webapplication build with Django and MongoDB) from the financial
domain where we need transactions/locking/ACID and talk about the differences to eg MySQL/PostgreSQL
• 5min: also, with this use case, other things like: storing various precison numbers
• 5min: summarize what I've told them

You start after me and drill down on details (the stuff you mentioned in your email ~9 days ago) or whatever you/we see fit.

Flavio

• 2min: I'll tell the audience the topics I'll talk about and how they help us with mongodb and django integration
• 5min: Mappers & Stack, I'll list some of the current ODM's used to integrate mongodb and django and how django-mongodb-engine
integrates with django and mongodb.
• 5min: I'll talk about queries, what we have in sql that we don't have in mongodb and how we can obtain the same results using it
◦ perfect, nothing to add/change here
• 3min: I'll talk about embedding and referencing, when it worths doing each and why
• 5min: I'll talk about how it is possible to take advantage of schemeless databases in web programming (django oriented)
◦ ok sounds good, not sure I understand exactly; approach me today on #sunoano and give me an example
• 5min: Summarize and maybe some benchmark!!!

Who are we?

Still, with all the technology we have these days, at the end of the day it is all about the people ...

/me definitely not a

Markus Gattol

• grown up in Carinthia (southernmost Austrian state, bordering Italy), lives in the UK now
◦ http://sunoano.name/albums/places/austria/index.html
• technical background, MSc (Computer Science, Electrical Engineering)
• with Linux (Debian) since 1995, Contributor
• RDBMSs, the usual ...
• Open Source Developer/Contributor in general
• website http://sunoano.name
◦ http://sunoano.name/ws/mongodb.html
• works for Heart Internet Ltd., NSN before that
◦ http://www.heartinternet.co.uk

Flavio Percoco Premoli

• GNOME a11y Contributor (MouseTrap [http://live.gnome.org/MouseTrap])
• Open Source Developer/Contributor (Web and Desktop)
• R&D Developer at The Net Planet Europe
◦ NoSQL Technologies
◦ Cloud Computing
◦ Knowledge Management Systems
• Linux Lover/User and Mac user too
• website: http://www.flaper87.org
• Twitter: FlaPer87
• Github: FlaPer87
• Bitbucket: FlaPer87
• Everywhere else: FlaPer87

Introduction Part 1

The why ...

1. why are you here today?
2. why does some business want to know about new technology?
3. why are we looking to move away from RDBMs to NoSQL DBMSs?
4. German: Hardware und Software sind dann gut, wenn sie sich verstehen lassen, während man sie benutzt - und nicht, wenn
man damit vielleicht zum Mars fliegen kann.

Part 1 is mainly about MongoDB itself and not about Django/Python .... Part 2? .... Django!

What I am going to tell you

Best listener experience possible ...

Introduction Part 1 ... Tell the audience what you're going to tell them
Tell them
Integration with other Technologies
Frequently Asked Questions
Use Case
Summary Part 1 ... Tell the audience what you told them

Integration with other Technologies

• How can I get MongoDB?
• Ok, have it! Now what?

1. full-disk encryption / filesystem-level encryption
2. backup technologies, Rsync/Unison, Bacula, Amanda
3. LVM
4. VPN, SSH
5. Virtualization, OpenVZ

Frequently Asked Questions

Well, just because ...

Basics

Before we start running we need to be able to walk ...

Are there any Reasons not to use MongoDB?

1. We need transactions (ACID (Atomicity, Consistency, Isolation, Durability)).
2. Our data is very relational.
3. Related to 2, we want to be able to do joins on the server (but can not do embedded objects / arrays).
4. We need triggers on our tables. There might be triggers available soon however.
5. We rely on triggers (or similar functionality) for cascading updates or deletes.
6. We need the database to enforce referential integrity (MongoDB has no notion of this at all).
7. If we need 100% per node durability.
8. Write ahead log. MongoDB does not have one simply because it does not need one.
9. Dynamic aggregation with ad-hoc queries; Crystal reports, reporting, business logic, ... RDBMSs heartland ...

What are the supported Programming Languages?

Right now (June 2010) we can use MongoDB from at least C, C++, C#, .NET, ColdFusion, Erlang, Factor, Java,
Javascript, PHP, Python, Ruby, Perl. Of course, there might be more languages available in the future.

What is the Status of Python 3 Support?

The current thought is to use Django as more or less a signal for when adding full support for Python 3 makes sense.
MongoDB can probably support it a bit earlier than Django does, but that is certainly not something the MongoDB community
wants to rush and then have to support two totally different code bases.

What is the difference in the main Building-blocks to RDBMSs?

We have RDBMSs like for example MySQL, Oracle, PostgreSQL and then there are NoSQL DBMSs like for example MongoDB.
Below is a breakout about how MongoDB relates to
the afore mentioned, it is a breakout about how the main building blocks of each party resemble:

MySQL, PostgreSQL, Oracle
--------------------------------------------
Server:Port
- Database
- Table
- Row

MongoDB
--------------------------------------------
Server:Port
- Database
- Collection
- Document

Administration

The usual handicraft work ... get and keep it running ... if in doubt, automate!

Is there a Web GUI? What about a REST Interface/API?

• assuming a mongod process is running on localhost then we can access some statistics at http://localhost:28017/ and
http://localhost:28017/_status
• In order to have a REST interface to MongoDB, same as CouchDB has it, we have to start mongod with the --rest switch.
◦ Note however that this is just a read-only REST interface.
• For a read and/or write REST interface:
◦ http://www.mongodb.org/display/DOCS/Http+Interface
◦ http://github.com/kchodorow/sleepy.mongoose
◦ http://github.com/tdegrunt/mongodb-rest
• If we wanted real-time updates from the CLI, then we could also use mongostat.

Can I rename a Database?

Yes, but it is not as easy as renaming a collection. As of now, the recommended way to rename a database is to clone it
and thereby rename it. This will require enough additional free disk space to fit the current/old database at least twice.

How do I physically migrate a Database?

There is even a clone command for that. Note however that neither copyDatabase() nor cloneDatabase() actually perform a
point-in-time snapshot of the entire database -- what they basically do is query the source database and then
replicate to the target database i.e. if we use copyDatabase() or cloneDatabase() on a source database which is online
and has operations performed on it, then the target database cannot be a point-in-time snapshot pointing to the
exact time when either one command was issued. Rather, at some point in time, they will/might have the same data/state as
their source database.

Secure Copy .... as in scp

A bit downtime but the chance to resume a canceled transfer ....

• shutdown mongod on the old machine
• copied/sync the database directory to the new machine
• start mongod on the new machine with dbpath set appropriately
◦ http://sunoano.name/ws/debian_notes_cheat_sheets.html#resume_an_scp_transfer

Minimum Downtime

Below is what we could do in order to have as little downtime as possible:

• stop and re-start the existing mongod as master (if it is not already running as master that is)
• install mongod on the new machine and configure it as slave using --slave and --source
• wait while the slave copies the database, re-indexes and then catches up with its master (this happens
automatically when we point a slave to its master). Once the slave has caught up, we
• disable writes to the master (clients can still read/query)
• once all outstanding writes have been committed on the master and the slave caught up, we shutdown the master
and restart the slave as new master. The old master can now be removed entirely.
• now we point all traffic at the new master

• finally we enable writes on the new master again, ... Et voilà!

Of course, we might also use OpenVZ and its live-migration feature ...

How do I update to a new MongoDB version?

If it is a drop-in replacement we just need to shutdown the older version and start the new one with the
appropriate dbpath. Otherwise, i.e. if it is not a drop-in replacement, we would use mongoexport followed by
mongoimport.

What is the default listening Port and IP?

We can use netstat to find out:

wks:/home/sa# netstat -tulpena | grep mongo
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 124 1474236 8822/mongod
tcp 0 0 0.0.0.0:28017 0.0.0.0:* LISTEN 124 1474237 8822/mongod
wks:/home/sa#

The default listening port for mongod is 27017. 28017 is where we can point our web browser in order to get some
statistics. The default listening IPs are all local IPs i.e. 0/0 which matches all source addresses from 0.0.0.0 with
netmask 0.0.0.0 i.e all source addresses from the local machine ... plus ...

And yes, this includes the loopback device/address/network 127.0.0.0/8, the private class A network 10.0.0.0/8, the
private class B network 172.16.0.0/12 and of course also the private class C network 192.168.0.0/16 amongst others.

Both, listening port and IP address, can be changed either by using the CLI switches --port and --bind_ip or the
configuration file which we can figure out by looking at the runtime configuration.

Is there a Way to do automatic Backups?

Yes, http://github.com/micahwedemeyer/automongobackup

What is getSisterDB() good for?

We can use it to get ourselves references to databases which not just saves a lot of typing but is, once we got used to
using it, a lot more intuitive:

1 sa@wks:~/mm/new$ mongo
2 MongoDB shell version: 1.5.2-pre-
3 url: test
4 connecting to: test
5 type "help" for help
6 > db.getCollectionNames();
7 [ "fs.chunks", "fs.files", "people", "system.indexes", "test" ]
8 > reference_to_test_db = db.getSisterDB('test');
9 test
10 > reference_to_test_db.getCollectionNames();
12 > use admin
13 switched to db admin
14 > reference_to_test_db.getCollectionNames();
16 > bye
17 sa@wks:~/mm/new$

Note how we get a reference to our test database in line 8 and how it is used in lines 10 and even line 14, after switching from
our test database to the admin database. getCollectionNames() has just been chosen as an example, it could have been any
other command as well of course.

How can I make MongoDB automatically start/restart on Server boot/reboot?

One way would be to use the @reboot directive with Cron. However, .deb and .rpm packages install init scripts (sysv or
upstart style, as appropriate) on Debian, Ubuntu, Fedora, and CentOS already so MongoDB will restart there without
further need from us to do anything special.

• For other constellations, http://gist.github.com/409301is an init.d script for Unix-like systems based on
http://bitbucket.org/bwmcadams/toybox/src/3e84be941408/mongodb.init.rhel.
• For Mac OS X, people have reported that launchctl configurations like http://github.com/AndreiRailean/MongoDB-OSX-
Launchctl/blob/master/org.mongo.mongod.plist work.
• For Windows, we have http://www.mongodb.org/display/DOCS/Windows+Service documentation.

Resource Usage

Lot's of confusion amongst beginners ...

Why is my Database growing so fast?

The first file for a database is dbname.0, then dbname.1, etc. dbname.0 will be 64 MiB, dbname.1 128 MiB, ... up to 2 GiB.
Once the files reach 2 GiB in size, each successive file is also 2 GiB.

So, if we have say, database files up to dbname.n, then dbname.n-1 might be 90% unused but dbname.n has already be
allocated once we start using dbname.n-1. The reasoning here is simple: we do not want to wait for new database files
when we need them so we always allocate the next one in the background as soon as we start to use an empty
one.

Note that deleting data and/or dropping a collection or index will not release already allocated disk space since it is
allocated per database. Disk space will only be released if a database is repaired or the database is dropped altogether. Go to
http://www.mongodb.org/display/DOCS/Developer+FAQ#DeveloperFAQ-Whyaremydatafilessolarge%3F for more information.

What Caching Algorithm does MongoDB use?

Actually, that is done by the OS using the LRU (Least Recently Used) caching pattern.

Why does MongoDB use so much RAM?

Well, it does not actually, it is just that most folks do not really understand memory management -- there is more to it than
just is in RAM or is not in RAM.

The current default storage engine for MongoDB is called MongoMemMapped_RecStore. It uses memory-mapped files for
all disk I/O operations. Using this strategy, the operating system's virtual memory manager is in charge of caching.
This has several implications:

• There is no redundancy between file system cache and database cache, actually, they are one and the same.
• MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
• Virtual memory size and RSS (Resident Set Size) will appear to be very large for the mongod process. This is benign
however -- virtual memory space will be just larger than the size of the datafiles open and mapped i.e. resident size will
vary depending on the amount of memory not used by other processes on the machine.
• Caching behavior (such as LRU'ing out of pages, and laziness of page writes) is controlled by the operating system. The
quality of the VMM (Virtual Memory Manager) implementation will vary by OS.

As of now, an alternative storage engine (CachedBasicRecStore), which does not use memory-mapped files, is under
development. This engine is more traditional in design with its own page cache. With this store the database has more control
over the exact timing of reads and writes, and of the cache LRU strategy.

Generally, the memory-mapped store (MongoMemMapped_RecStore) works quite well. The alternative store will be useful in
cases where an operating system's VMM is behaving suboptimal.

What is the so-called Working Set Size?

Working set size can roughly be thought of as how much data we will need MongoDB (or any other DBMS, relational or
non-relational) to access in a period of time.

For example, YouTube has ridiculous amounts of data, but only 1% may be accessed at any given time. If, however, we are
in the rare case where all the data we store is accessed at the same rate at all times (LRU), then our working set size can be
defined as our entire data set stored in MongoDB.

How much RAM does MongoDB need?

We now know MongoDB's caching pattern, we also know what a working set size is. Therefore we can have the following rule
of thumb on how much RAM a machine needs in order to work properly.

It is the working set size plus MongoDB's indexes which should reside in RAM at all times i.e. the amount of available
RAM should be at least the working set size plus the size of indexes plus what the rest of the OS and other software running
on the same machine needs.

Speed Impact of not having enough RAM

Generally, when databases are to big to fit into RAM entirely, and if we are doing random access, we are in
trouble as HDDs are slow at that (roughly a 100 operations per second per drive).

One solution is to have lots of HDDs (10, 100, ...). Another one is to use SSDs (Solid State Drives) or, even better,
add more RAM. Now that being said, the key factor here is random access. If we do sequential access to data
bigger than RAM, then that is fine.

So, it is ok if the database is huge (more than RAM size), but if we do a lot of random access to data, it is best if
the working set fits in RAM entirely.

However, there are some nuances around having indexes bigger than RAM with MongoDB. For example, we can
speed up inserts if the index keys have certain properties -- if inserts are an issue, then that would help.

Can I limit MongoDB's RAM Usage?

No, it is not designed to do that, it is designed for speed and scalability.

If we wanted to run MongoDB on the same physical machine alongside some web server and for example some application
server like Django, then we could ensure memory limits on each one by simply using virtualization and putting each one in
its own VE (Virtual Environment). In the end we would thus have a web application made of MongoDB, Django and for
example Cherokee, all running on the same physical machine but being limited to whatever limits we set on each VE they run
in.

What can I do about Out Of Memory Errors?

If we are getting something like this Fri May 21 08:29:52 JS Error: out of memory (or akin stuff) in our logs, then we hit a
memory limit.

As we already know, MongoDB takes all RAM it can get i.e. RAM, or more precisely RSS (Resident Set Size), itself part of
virtual memory, will appear to be very large for the mongod process.

The important point here is how it is handled by the OS. If the OS just blocks any attempt to get more virtual
memory or, even worse, kills the process (e.g. mongod) which tries to get more virtual memory, then we have got a
problem. What can be done is to elevated/alter a few settings:

1 sa@wks:~$ ulimit -a | egrep virtual|open
2 open files (-n) 1024
3 virtual memory (kbytes, -v) unlimited
4 sa@wks:~$ lsb_release -irc
5 Distributor ID: Debian
6 Release: unstable
7 Codename: sid
8 sa@wks:~$ uname -a
9 Linux wks 2.6.32-trunk-amd64 #1 SMP Sun Jan 10 22:40:40 UTC 2010 x86_64 GNU/Linux
10 sa@wks:~$

As we can see from lines 5 to 9, I am on Debian sid (still in development) running the 2.6.32 Linux kernel.

The settings we are interested in are with lines 2 and 3. Virtual memory is unlimited by default so that is fine already --
this is actually what causes the most problems so we need to make sure virtual memory is either reasonably high or, even
better, set to unlimited as shown above. With regards to allowed open file descriptors -- by default we are limited to 1024
open files which, in some cases, might pose a problem -- simply elevating it might be enough already and make memory

errors go away.

Note that we need to run these commands (e.g. ulimit -v unlimited) in the same user context as mongod i.e. we basically
want to script them as part of our mongod startup process.

OpenVZ

If we are running MongoDB with OpenVZ then there are some more settings we might want to tune in order to avoid the
OOM (Out of memory) killer to kick in or simply hit the virtual memory ceiling if not set to unlimited. Special attention
should be paid to the OpenVZ memory settings i.e. they should be set to reflect MongoDB's memory usage.

Does MongoDB use more than one CPU Core?

For write operations MongoDB makes use of one CPU core. For read operations however, which tend to be the
majority of operations, MongoDB uses all CPU cores available to it.

In short: one will notice a speed increase going from a single-core CPU to dual-core or even higher e.g. quad-core
or maybe even octo-core since the speed increase is roughly proportional to the available CPU cores.

How can I tell how many clients are connected?

We can look at the connections field (current) with the server status:

sa@wks:~$ mongo --quiet
type "help" for help
> db.serverStatus();
{

[skipping a lot of lines ...]

"connections" : {
"current" : 2,
"available" : 19998
},

[skipping a lot of lines ...]

}
> bye
sa@wks:~$

How many parallel Client Connections to MongoDB can there be?

Have a look at the connections field (available) with the server status.

Does MongoDB do Connection Pooling?

Yes, we can do connection pooling for performance reasons and overall resource usage optimization -- without it things
would be a lot slower and resource intensive. Fact is that as of now (June 2010) most of the client drivers do connection
pooling, how exactly it is done varies with driver e.g. PyMongo.

Is there a Size limit of how much Data can be stored inside MongoDB?

4 MiB is the limit on individual documents, but GridFS uses many documents, so there is no limit, technically/
practically speaking.

As the above is true for x86-64, it is not entirely true for x86 (32 bit) -- there is a limit because of how memory mapped files
work which
is a limit of 2GiB per database.

Do embedded Documents count toward the 4 MiB BSON Document Size Limit?

Yes, the entire BSON (Binary JSON) document (including all embedded documents, etc.) cannot be more than 4 MiB in size.

Does Document Size impact read/write Performance?

Yes, but this is mostly due to network limitations e.g. one will max out a GigE link with inserts before document size starts
to slow down MongoDB itself.

Is there a Way to tell the Size of a specific Document?

Yes, one can use Object.bsonsize(db.whatever.findOne()) in the shell like this:

sa@wks:~$ mongo
MongoDB shell version: 1.5.1-pre-
url: test
connecting to: test
> db.test.save({ name : "katze" });
> Object.bsonsize(db.test.findOne({ name : "katze"}))
38
> bye
sa@wks:~$

How can I tell the Size of a Collection and its Indexes?

> db.getCollectionNames();
[ "fs.chunks", "fs.files", "people", "system.indexes", "test" ]
> db.test.dataSize();
160
> db.test.storageSize();
2304
> db.test.totalIndexSize();
8192
> db.test.totalSize();
10496

We are using the test collection here. dataSize() is self-explanatory. storageSize() includes our data and all the still free
but already allocated disk space to this collection. totalIndexSize() is the size in bytes of all the indexes in this
collection and totalSize() is all the storage allocated for all data and indexes in this collection. If we need/want a
more detailed view we could also have a look at

> db.test.validate();
{
"ns" : "test.test",
"result" : "
validate
firstExtent:2:2b00 ns:test.test
lastExtent:2:2b00 ns:test.test
# extents:1
datasize?:160 nrecords?:4 lastExtentSize:2304

padding:1
first extent:
loc:2:2b00 xnext:null xprev:null
nsdiag:test.test
size:2304 firstRecord:2:2be8 lastRecord:2:2c58
4 objects found, nobj:4
224 bytes data w/headers
160 bytes data wout/headers
deletedList: 0000001000000000000
deleted: n: 1 size: 1904
nIndexes:1
test.test.$_id_ keys:4
",
"ok" : 1,
"valid" : true,
"lastExtentSize" : 2304
}
> bye
sa@wks:~$

Note that while MongoDB generally does a lot of pre-allocation, we can remedy this by starting mongod with --noprealloc
and --smallfiles.

Collections / Namespaces

Needs to be known, plain and simple ...

What is a Capped Collection? Why use it?

• Size: http://www.mongodb.org/display/DOCS/Capped+Collections
• Time (TTL Collections): http://jira.mongodb.org/browse/SERVER-211

Can I rename a Collection?

Yes. Using help(); from MongoDB's interactive shell we get, amongst others, db.test.renameCollection( newName ,
<dropTarget> ) which renames the collection. So yes, we could do db.foo.renameCollection('bar'); and have the collection foo
renamed to bar. Renaming a collection is an atomic operation by the way.

What is a Virtual Collection? Why use it?

It refers to the ability to reference embedded documents as if they were a first-class collection of top level
documents, querying on them and returning them as stand-alone entities, etc.

Can I use a larger Number of Collections/Namespaces?

There is a limit to how much collections/namespaces we can have within a single MongoDB database. It is ~24000
namespaces per database. This is essentially the number of collections plus the number of indexes.

How about cloning a Collection?

Yes, possible. Have a look at mongoexport and mongoimport.

Can I merge two or more Collections into one?

Yes, we read from all collections we want to merge and use insert() to write it into our single target collection. This
can be done on the server (using MongoDB's interactive shell) or from a client.

How can I get a list of Collections in my Database?

We can use getCollectionNames() as shown below in lines 8 and 9. Yet another possibility is shown in lines 23 to 28. Of
course, since every collection is also a namespace, we can find them aside indexes in lines 11 to 21:

1 sa@wks:~$ mongo
2 MongoDB shell version: 1.2.4
3 url: test
4 connecting to: test
5 type "help" for help
6 > db
7 test
8 > db.getCollectionNames();
9 [ "fs.chunks", "fs.files", "mycollection", "system.indexes", "things" ]
10 > db.system.namespaces.find();
11 { "name" : "test.system.indexes" }
12 { "name" : "test.fs.files" }
13 { "name" : "test.fs.files.$_id_" }
14 { "name" : "test.fs.files.$filename_1" }
15 { "name" : "test.fs.chunks" }
16 { "name" : "test.fs.chunks.$_id_" }
17 { "name" : "test.fs.chunks.$files_id_1_n_1" }
18 { "name" : "test.things" }
19 { "name" : "test.things.$_id_" }
20 { "name" : "test.mycollection" }
21 { "name" : "test.mycollection.$_id_" }
23 > show collections
24 fs.chunks
25 fs.files
26 mycollection

27 system.indexes
28 things
29 > bye
30 sa@wks:~$

How do I delete a Collection?

db.collection.drop() but there is no undo so beware.

What is a Namespace with regards to MongoDB?

Collections can be organized in namespaces. These are named groups of collections defined using a dot notation. For
example, we could define collections blog.posts and blog.authors, both reside under the namespace blog but are two separate
collections.

Namespaces can then be used to access these collections using the dot notation e.g. db.blog.posts.find(); will return all
documents from the collection blog.posts but nothing from the collection blog.authors.

Namespaces simply provide an organizational mechanism for the user i.e. the collection namespace is flat from the
database point of view which means that blog.authors really just is a collection on its own and not some collection authors
grouped under some namespace blog. Again, the collection namespace is flat from the database point of view i.e. technically
speaking blog.authors is no different than foo or foo.bar.baz -- grouping just helps the humans keep track ...

How can I get a list of Namespaces in Database?

One way to list all namespaces for a particular database would be to enter MongoDB's interactive shell:

sa@wks:~$ mongo
MongoDB shell version: 1.2.4
url: test
connecting to: test
> db.system.namespaces.find();
{ "name" : "test.system.indexes" }
{ "name" : "test.fs.files" }
{ "name" : "test.fs.files.$_id_" }
{ "name" : "test.fs.files.$filename_1" }
{ "name" : "test.fs.chunks" }
{ "name" : "test.fs.chunks.$_id_" }
{ "name" : "test.fs.chunks.$files_id_1_n_1" }
{ "name" : "test.things" }
{ "name" : "test.things.$_id_" }
{ "name" : "test.mycollection" }
{ "name" : "test.mycollection.$_id_" }
> db.system.namespaces.count();
11
> bye
sa@wks:~$

The system namespace in MongoDB is special since it contains database system information (read metadata). There are
several collections like for example system.namespaces which for example can be used to get information about all the
namespaces with some database.

Statistics / Monitoring

Because pilots need to know ...

The Server Status, what does it tell?

> db.serverStatus();
{
"uptime" : 6695,
"localTime" : "Sun Apr 11 2010 11:22:19 GMT+0200 (CEST)",
"globalLock" : {
"totalTime" : 6694193239,
"lockTime" : 45048,
"ratio" : 0.000006729414343397326
},
"mem" : {
"resident" : 3,
"virtual" : 138,
"supported" : true,
"mapped" : 0
},

Most of it is obvious like for example uptime. The globalLock part is interesting. totalTime is the same as uptime but in
microseconds. lockTime is the amount of time the global lock has been held i.e. the total time spend waiting for write
queries until a lock has been assigned and thus a write could be made.

One may ask what is the point of having both, uptime and totalTime? Well, totalTime will rollover faster since it is in
microseconds, at some point they diverge. The rollover is coordinated between totalTime and lockTime.

mem units are in MiB, all of them. resident, what is in physical memory (also known as RAM), virtual is the virtual
address space, mapped is the space memory mapped, and supported is if memory info is supported on our platform.

"connections" : {
"current" : 2,
"available" : 19998
},
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : 146048,
"page_faults" : 57
},
"indexCounters" : {
"btree" : {
"accesses" : 0,
"hits" : 0,
"misses" : 0,
"resets" : 0,
"missRatio" : 0
}
},
"backgroundFlushing" : {
"flushes" : 111,
"total_ms" : 2,
"average_ms" : 0.018018018018018018,
"last_ms" : 0,
"last_finished" : "Sun Apr 11 2010 11:21:45 GMT+0200 (CEST)"
},

connections tells us how many client connections we can open against mongod, more precisely, current tells us how
many existing client connections to mongod there are right now and available shows us how many we got left.

Within the extra_info part we have heap_usage_bytes which is the main memory needed by the database.

"opcounters" : {
"insert" : 16513,
"query" : 1482263,
"update" : 141594,
"delete" : 38,
"getmore" : 246889,
"command" : 1247316
},
"asserts" : {
"regular" : 0,
"warning" : 0,
"msg" : 0,
"user" : 0,
"rollovers" : 0
},
"ok" : 1
}
> bye
sa@wks:~$

The opcounters part is also pretty interesting. insert, query, update, and delete are self-explanatory but getmore and
command are probably not. When we do a query, we get results in batches. The first batch is counted in query, all
subsequent in getmore. commands are things like count, group, distinct, etc.

And yes, taking those numbers and dividing them by time (delta or total) will give us operations/time e.g. operations
per second or operations since mongod got started. In fact, there is a Munin plugin (http://github.com/erh/mongo-munin)
which does use this.

Schema / Configuration

Sorry folks, no can do, lack of time ... go to http://sunoano.name/ws/mongodb.html#faqs_schema_configuration

Indexes / Search / Metadata

Sorry folks, no can do, lack of time ... go to http://sunoano.name/ws/mongodb.html#faqs_indexes_search_metadata

Map / Reduce

Sorry folks, no can do, lack of time ... go to http://sunoano.name/ws/mongodb.html#faqs_map_reduce

GridFS / Data Size

Store tons of data reliable and smart ...

What is GridFS?

Basically a collection of normal documents. We have two collections, one for metadata (fs.files) and one consisting of
chunks of data (fs.chunks).

The GridFS spec provides a mechanism for transparently dividing a large file among multiple documents. This allows
us to efficiently store large objects, and in the case of especially large files, such as videos, permits range operations
(e.g., fetching only the first n bytes of a file).

What can we do with GridFS

Store ridcoulous amounts of data in a smart way.

Why use GridFS over ordinary Filesystem Storage?

If we use the filesystem we would have to handle backup/replication/scaling ourselves. We would also have to come up
with some sort of hashing scheme ourselves plus we would need to take care about cleanup/sorting/moving because
filesystems do not love lots of small files.

With GridFS, we can use MongoDB's built-in replication/backup/scaling e.g. scale reads by adding more read-only
slaves and writes by using sharding. We also get out of the box hashing (read UUID (Universally Unique Identifier)) for
stored content plus we do not suffer from filesystem performance degradation because of a myriad of small files.

Also, we can easily access information from random sections of large files, another thing traditional tools working with
data right off the filesystem are not good at. Last but not least, we can keep information associated with the file (who has
edited it, download count, description, etc.) right with the file itself.

Scalability / Fault Tolerance / Load Balancing

Sorry folks, no can do, lack of time ... go to http://sunoano.name/ws/
mongodb.html#faqs_scalability_fault_tolerance_load_balancing

Miscellaneous

Sorry folks, no can do, lack of time ... go to http://sunoano.name/ws/mongodb.html#faqs_miscellaneous

Use Case

This should have been my major part
◦ locking (read transactions)
◦ asynchronous as opposed to synchronous operations
◦ numbers (double precision)

Again, lack of time ... go to http://sunoano.name/ws/mongodb.html

Summary Part 1

Tell them what you told them ... simple as that ...

Introduction Part 2

Before starting with mongodb specific topics it's important to know that we don't dislike relational databases, we know they
are good for many things but we also know that web applications success is mainly based on their performance and speed
so that's what we're running after and that's why we're all here.

Existing Technologies

• MongoKit (Nicolas Clairon):
◦ Great for completely unstructured model programming. It has structure validation but I’ve never used it, I prefer
to use mongokit on models that may be constantly changing their structure.

• mongoengine (Harry Marr):
◦ It allows you to define schemas for documents and query collections using django-like syntax.

• django-mongodb-engine (Alberto Paro and myself):
◦ This is a real Django backend based on django-mongodb and mongoengine, adapted to work with django-
nonrel and mongodb without changing anything in the code.

SQL to MongoDB Query Translation....

"What matters is who adapts faster to the changing conditions"
- Charles Darwin

The first we should remember when passing from SQL databases to NoSQL ones is that models were made to model data but,
models can be modeled too, what I mean is that people use to adapt databases features to their models instead of adapting
models to databases. I'll try to mention some of the common quesitons found in the m-l:

• Lets start with JOINS. Why JOINS? Because we don’t have those in MongoDB and we might need them so, we have to
figure out what’s the best workaround for this. The best thing you can do here is forget about JOINS, you wont have
them we are not talking about highly relational databases we are talking about non relational ones so there can't be joins
between 2 collections if there's no relation between them. One of the things we did was remodeling the way we stored
data. We embedded what could be embedded and did 2 or more queries where embedding was not possible.

• What about ForeignKeys, do we have those? Yes, or kind off. We have DBRef which is a kind of ForeignKey but I
personally wouldn't use refs in mongodb. As I said, MongoDB is not about referencing and collection relations it is about
performance based on dynamism.

• If MongoDB barely has references you could guess that many to many is insignificant, instead of that I would start
thinking on dicionaries/maps and lists/arrays.

• And last but not least, If you really need to do a query that joins 2 collections based on a field reference that should
handle a many to many relation then you have map/reduce.

Keeping things lazy...

Yes, because we’re lazy people so we do lazy things ...

It is important when getting orms to work with mongodb that we keep things lazy to avoid bottle necks in our web applications.
Mongodb doesn't have many to many relations but it can have lists and dictionaries saved. For example

class User(models.Model)
nickname = models.CharField(max_length=255)
full_name = models.CharField(max_length=255)
friends = ListField()
groups = ListField()

In the User model we have 2 ListFields that may cause some slow downs in our web application, the first one is a list containing
ids/names of the user friends and the second one containing the groups user is related to so, think of a user that have many
friends and that is related to many groups (a popular one), that's a lot of data transfer and many instantiations for our code
because each object/id in the ListField should be instantiated. Maybe this might sound obvios but trust me, nothing is obvious
when doing web programming.

Keeping Relations or Embedding?

This is a common question when moving from relational databases to non-rel ones. Should we keep our models related or embed
smallest ones into the biggest ones?. The answer is NO, you shouldn't keep them related. For Example, A common situation (or
commonly used to show how mongodb works) is a blog engine with posts and comments. Lets see how we could handle
comments (not threaded) in our blog engine:

Using References:

class Comment(models.Model):
post = models.ForeignKey(Post)
user = models.ForeignKey(User)
text = models.CharField(max_length=255)

my_comment, created = Comment.objects.get_or_create(post=my_post, user=my_user, text=my_text,
defaults={})

Without references:

class Post(models.Model)
....
comments = ListField()

post.comments.append({ ‘user’ : user, ‘text’ : text})
post.save()

The first example is the most used because is the way we're used to think when we write our models but, the second one is the
right one when talking about nosql databases because references make things slower.

The bad thing about embedding our comments like that is that we have to worry about our 4mb Document limit so if we are
really popular on the net and many people comes to our blog and comments our posts, that might be a problem for us, even
though, This is great, I mean, we have removed a model from our app so it should be easier to maintain, shouldn't it? but, what
is user supposed to be? Is it an embedded user object? is it a ForeignKey? what is it? How should we handle users there?

It again depends on how you'd like to do things, for example It is possible to save the username as it should be showed and then
when the comments are loaded just show the username, for those wanting to know more about this user then it is possible to do
that just by clicking on its username it'll load the user's personal info. Here are some examples:

Light and fast (For registered users):

post.comments.append({'user' : 'FlaPer87', 'text' : 'My Comment'})
post.save()

Heavy and slow (For any user):

post.comments.append({'user' : {'username' : 'FlaPer87',
'email' : 'flaper87@flaper87.org',
'url' : 'http://blog.flaper87.org'},
'text' : 'My Comment'})
post.save()

Lazy relations or mongodb like ones:

#Automatic serialization done in django-mongodb-engine
post.comments.append({'user' : {'_app': model._meta.app_label,
'_model': model._meta.module_name,
'pk': model.pk,
'_type': "django"},
'text' : 'My Comment'})
post.save()

Taking Advantage from schema-less Databases for Web
Development

One of the things I like more from mongodb is that it is schema-less. People use to think about schema-less dbs as a mess which
they're not. Schema-less databases do have a structure the difference between them and Schema based ones is that the
schema-less structures are dynamic, this means that they can be modified at anytime and they're not typed, you can think about
schema-less dbs as (just like mongodb does) json based maps.

This kind of structures can be really helpful when doing web programing, in our case they let us save any kind of data in our
collections and have generic structures that changed during the time. For example, let's try to improve our Comment model (in
case we decided to have some relations).

class Comment(models.Model):
post = models.ForeignKey(Post)
user = GenericField()
text = models.CharField(max_length=255)

my_user = "FlaPer87" #Known User

my_comment, created = Comment.objects.get_or_create(post=my_post,
user=my_user,
text=my_text, defaults={})

my_user = {'nickname' : 'FlaPer87',
'full_name' : 'Flavio Percoco Premoli',
'email' : 'flaper87@flaper87.org',
'url' : 'http://blog.flaper87.org'} #Anonymous User

my_comment2, created = Comment.objects.get_or_create(post=my_post,
user=my_user,
text=my_text,
defaults={})

Using a GenericField we'll be able to save anything into that attr and we'll have to do our checks and controls code side. In this
case the Schema-less collection helped us to get/save the anonymous users information without having to create a record in our
Users table or without forcing the user to register.

Summary Part 2

• Re-model your models
• Be Lazy to be faster
• Forget about relations, they will slow you down
• Remember that dynamism is better than restrictions

MongoDB Table of Contents Guide

MongoDB Table of Contents Guide

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a MongoDB Table of Contents Guide

Similar a MongoDB Table of Contents Guide (20)

Más de Skills Matter

Más de Skills Matter (20)

MongoDB Table of Contents Guide