Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Round pegs and square holes
1. Round Pegs
and
Square Holes
Django and MongoDB
by Daniel Greenfeld and Audrey Roy
2. Audrey / Danny
• Principals at Cartwheel
Web
Photo credit: Christopher Neugebauer
• Co-founders of
consumer.io
• Met at PyCon 2010
• Now affianced
@pydanny / @audreyr cartwheelweb.com
3. What is MongoDB?
• NoSQL
• Fast, Scalable, Indexable, Shardable
• Schema-less
@pydanny / @audreyr cartwheelweb.com
4. What is MongoDB?
• Written in C++
• Stores data in BSON (kinda like
JSON)
• Uses JavaScript internally for
scripting
• Has Python, Ruby, PHP, C, other
drivers
• Huge community
@pydanny / @audreyr cartwheelweb.com
5. MongoDB: SQL Analogy
• Collections are like tables
• Documents are like records (rows)
@pydanny / @audreyr cartwheelweb.com
6. What is a MongoDB queryset?
As served by pymongo
Minimalist view
@pydanny / @audreyr cartwheelweb.com
7. What is a MongoDB queryset?
As served by pymongo
A list
of dictionaries
Minimalist view
@pydanny / @audreyr cartwheelweb.com
8. What it looks like
list Minimalist view
collection = []
document = {
'_objectId': ObjectId('4f844e916c97c1000c000003'),
'username': 'pydanny',
'fiancee': {
'username': 'audreyr',
'location': 'Zurich'
}
}
collection = [document, ]
@pydanny / @audreyr cartwheelweb.com
9. What it looks like
list Minimalist view
dictiona
collection = []
document = {
'_objectId': ObjectId('4f844e916c97c1000c000003'),
'username': 'pydanny',
'fiancee': {
'username': 'audreyr',
'location': 'Zurich'
}
}
collection = [document, ]
@pydanny / @audreyr cartwheelweb.com
10. What it looks like
list Minimalist view
dictiona
collection = []
document = {
'_objectId': ObjectId('4f844e916c97c1000c000003'),
'username': 'pydanny',
'fiancee': {
'username': 'audreyr',
'location': 'Zurich'
}
}
collection = [document, ]
list of dictionaries!
@pydanny / @audreyr cartwheelweb.com
15. pymongo
http://api.mongodb.org/python/current/
Official Python binding for MongoDB
my_data with title
>>> from pymongo import Connection
>>> connection = Connection()
>>> my_data = {'rating': 3, 'title': 'I like ice cream'}
>>> connection.reviews.insert(my_data)
>>> your_data = {'rating': 3, 'subject': 'You like ice cream'}
>>> connection.reviews.insert(your_data)
@pydanny / @audreyr cartwheelweb.com
16. pymongo
http://api.mongodb.org/python/current/
Official Python binding for MongoDB
my_data with title
>>> from pymongo import Connection
>>> connection = Connection()
>>> my_data = {'rating': 3, 'title': 'I like ice cream'}
>>> connection.reviews.insert(my_data)
>>> your_data = {'rating': 3, 'subject': 'You like ice cream'}
>>> connection.reviews.insert(your_data)
your_data with subject
@pydanny / @audreyr cartwheelweb.com
17. pymongo
[
{'rating': 3,'title': 'I like ice cream'},
{'rating': 3, 'subject': 'You like ice cream'}
]
>>> connection = pymongo.Connection()
>>> db = connection.db
>>> for review in db.reviews.find({'rating': 3}):
... review['title']
>>> for review in db.reviews.find(
... {"title": {"$regex": "ice cream"} }
... ):
... review['title']
@pydanny / @audreyr cartwheelweb.com
18. pymongo
[
{'rating': 3,'title': 'I like ice cream'},
{'rating': 3, 'subject': 'You like ice cream'}
]
>>> connection = pymongo.Connection()
>>> db = connection.db
>>> for review in db.reviews.find({'rating': 3}):
... review['title']
>>> for review in db.reviews.find(
... {"title": {"$regex": "ice cream"} }
... ):
... review['title']
Finds all reviews
with a rating of 3
@pydanny / @audreyr cartwheelweb.com
19. pymongo
[
{'rating': 3,'title': 'I like ice cream'},
{'rating': 3, 'subject': 'You like ice cream'}
]
>>> connection = pymongo.Connection()
>>> db = connection.db
>>> for review in db.reviews.find({'rating': 3}):
... review['title']
>>> for review in db.reviews.find(
... {"title": {"$regex": "ice cream"} }
... ):
... review['title']
@pydanny / @audreyr cartwheelweb.com
20. pymongo
[
{'rating': 3,'title': 'I like ice cream'},
{'rating': 3, 'subject': 'You like ice cream'}
]
>>> connection = pymongo.Connection()
>>> db = connection.db
>>> for review in db.reviews.find({'rating': 3}):
... review['title']
>>> for review in db.reviews.find(
... {"title": {"$regex": "ice cream"} }
... ):
... review['title']
Only finds the document
with ‘title’ in it.
@pydanny / @audreyr cartwheelweb.com
21. pymongo
Pros
• Really fast, bare metal
• Lets you go schema-crazy
• Supported directly by 10gen
• They say “PyMongo to work with
recommended way
is the
MongoDB from Python.”
@pydanny / @audreyr cartwheelweb.com
22. pymongo
Cons
• “PyMongo introspection” is an
oxymoron
• Very low-level
• Lose out on ModelForms, Auth,
Admin, etc
• Syntax mapperclean as with an
object
not as
@pydanny / @audreyr cartwheelweb.com
24. MongoEngine
http://mongoengine.org/
Doesn’t this look
import mongoengine as me
like the Django
class Review(me.Document):
ORM?
title = me.StringField()
body = me.StringField()
author = me.StringField()
created = me.DateTimeField(default=datetime.utcnow)
rating = me.IntField()
@pydanny / @audreyr cartwheelweb.com
25. MongoEngine
Doesn’t this look
like a Django
query?
>>> from reviews.models import Review
>>> for review in Review.objects.all():
... review.title
@pydanny / @audreyr cartwheelweb.com
26. MongoEngine
Pros
• Looks similar to Django ORM code
• You can develop SUPER-QUICKLY
• Can use with django-mongonaut for
introspection
• Light schema, unenforced by the db
• Supports some inter-document
connections
@pydanny / @audreyr cartwheelweb.com
27. MongoEngine
Cons
• Some feel there’s too much structure
• Validation messages sometimes
unclear
• Using it with Django,introspection*on
the Django Admin’s
you lose out
and ModelForms
* django-mongonaut addresses this.
@pydanny / @audreyr cartwheelweb.com
28. mongoengine and
django-mongonaut
http://bit.ly/django-mongonaut
@pydanny / @audreyr cartwheelweb.com
33. MongoKit
http://namlook.github.com/mongokit/
Connect
to
from mongokit import Document, Connection reviews
collectio
connection = Connection()
n
@connection.register
class Review(Document): Review model
structure = { representing
'title':unicode, MongoDB
'body':unicode,
Expected 'author':unicode,
collection
fields 'created':datetime.datetime,
'rating':int
}
required_fields = ['title', 'author', 'created']
default_values = {'rating': 0,
'created': datetime.utcnow}
@pydanny / @audreyr cartwheelweb.com
34. MongoKit
http://namlook.github.com/mongokit/
>>> from mongokit import Connection
>>> connection = Connection()
>>> for review in connection.Review.find({'rating': 3}):
... review['title']
Identical to pymongo queries
@pydanny / @audreyr cartwheelweb.com
35. MongoKit
Pros
• Light schema, unenforced by the db
• Or can go all out schemaless!
• Speed
• Types are a mix of Python &
MongoDB
• Uses pymongo-style queries
@pydanny / @audreyr cartwheelweb.com
36. MongoKit
Cons
• Using itDjango Admin’s lose out
on the
with Django, you
introspection, ModelForms, auth,
etc
• Introspection is hard
@pydanny / @audreyr cartwheelweb.com
38. Django-nonrel
+ mongodb-engine
http://docs.django-nonrel.org
• A patch to Django thatORM NoSQL
support to the Django
adds
• Works with GAE, MongoDB, even SQL
DBs
@pydanny / @audreyr cartwheelweb.com
39. Django-nonrel
+ mongodb-engine
Pros
• Can use Django as you normally
would
• Mirrors the ORM functionality
• Introspection via djangotoolbox
@pydanny / @audreyr cartwheelweb.com
40. Django-nonrel
+ mongodb-engine
Cons
• Fork of the whole Django project.
• Dependent Django core (still on
parity with
on others to maintain
Django 1.3).
• Multi-db usage is confusing
@pydanny / @audreyr cartwheelweb.com
41. Summary
• pymongo is low-level and well-supported
by 10gen.
• MongoEngine is like schemaless Django
models.
• MongoKit is like pymongo with extra
structure.
• Django-nonrel is a fork of Django 1.3.
@pydanny / @audreyr cartwheelweb.com
43. Danny’s Thoughts
Can we build a “simple” bridge?
What about a single third-party app that lets you
combine critical Django apps and MongoDB?
• django.contrib.auth
• django.forms
• django-social-auth /
registration
• others...
@pydanny / @audreyr cartwheelweb.com
44. Danny’s Thoughts
I wonder, why add schemas to schema-less
when:
Relational Databases
South
High level Caching
tools*
allow you to develop fast moving
datastores with transactions and built-
in Djangoorsupport?
* cache-machine johnny-cache
@pydanny / @audreyr cartwheelweb.com
45. Danny’s Thoughts
Introspection Tool Idea
Treat introspection like MongoDB
Queries
Immediate introspection tool
(no ‘title’ then don’t show title)
@pydanny / @audreyr cartwheelweb.com
46. Audrey’s Thoughts
• Schemaless dbs promise
performance advantages
• Especially for distributed systems
• Tradeoff: ACID compliance
http://stackoverflow.com/questions/3856222/whats-the-attraction-of-schemaless-
database-systems
@pydanny / @audreyr cartwheelweb.com
47. Audrey’s Thoughts
“Schemaless database”
==
ACID-noncompliant database
OK when performance is more important than
being consistent 100% of the time.
@pydanny / @audreyr cartwheelweb.com
48. Audrey’s Thoughts
Schemaless Python models
!=
Schemaless MongoDB collections
I think MongoEngine is best unless your use case
calls for schema anarchy.
@pydanny / @audreyr cartwheelweb.com
50. Using Django With
MongoDB
• Big hurdles: ORM, Admin, ModelForms,
Auth
• Were built for relational data
• But the situation is improving rapidly
@pydanny / @audreyr cartwheelweb.com
51. What needs to be
done
• New introspection tools (working on it)
• django.forms bridge.
• django.contrib.admin bridge.
• Drop-in 1.5?)
(Django
replacement for Django Auth
• Creation of best practices document for
use of MongoDB with Django.
@pydanny / @audreyr cartwheelweb.com
52. Django Mongonaut
• Introspection tool for MongoEngine
• Works well so far
• Integrate graphing tools
• Make independent from mongoengine
• Contributors wanted:
• https://github.com/pydanny/django-
mongonaut
@pydanny / @audreyr cartwheelweb.com
53. Django Mongonaut
Based off of immediate
introspection - not
definition
@pydanny / @audreyr
Mockup cartwheelweb.com
54. Django Mongonaut
Based off of immediate
introspection - not
definition
SVG w/links
for examining
nested
structures
@pydanny / @audreyr
Mockup cartwheelweb.com
55. Final Summary
{
pymongo
mongoengine
Python/ mongokit
MongoDB tools django-nonrel
to consider django-mongodb
django-
mongonaut
@pydanny / @audreyr cartwheelweb.com
Notas del editor
A: We have a specific time we have to present in, so please hold your questions and comments to the end.\n
A, D, A\n
D\n
D\n
A\n
D\n
D\n
D\n
A: Which is why we’re presenting here today rather than at LA Django - this is useful info for any Python developer.\n
A\n
D - Using a review system...\n
D\n
D\n
D: MongoDB queries are meant for schema-less data.\n
D: MongoDB queries are meant for schema-less data.\n
D: You can do crazy things like have every document in a collection have a completely different schema. \n
D: No introspection tools, just the shell. Certainly no admin. \n
A\n
A\n
A\n
A\n
A: If you use ReferenceFields all the time, you might as well be using SQL\n
A\n
D\n
D\n
D\n
D\n
D\n
D\n
D\n
A\n
A\n
A\n
A: According to posts on the mailing list, you should be able to use multiple databases. But I pored over the docs/mailing list and couldn’t find an example of, say, using Postgres for auth and MongoDB for a custom app\n
D, A, D, A\n
\n
D I’m in discussion with some other Python/MongoDB aficionados.\n
D I love normalization. I love what you can do with a schema. I love what you can do without a schema\n
D\n
A: Atomicity, consistency, isolation, durability. Properties that guarantee that a db transaction has been processed reliably.\n
A: Atomicity, consistency, isolation, durability. Properties that guarantee that a db transaction has been processed reliably.\nThere’s the very small risk that data you retrieve could be in a temporary inconsistent state.\n
A: NYT, Heroku, Intuit, IGN, Posterous, and others use Ruby’s MongoMapper\nIn the Ruby world, Mongo object mappers are much more prevalent.\n