Webinar: Dramatically Reducing Development Time With MongoDB

#MongoDB

Dramatically Reducing
Development Time With
MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
Solutions Architect, MongoDB

Who is your Presenter?
•

Yes, I use “Buzz” on my business cards

•

Former Investment Bank Chief Architect at
JPMorganChase and Bear Stearns before that

•

Over 25 years of designing and building systems
•
•
•
•

•

Big and small
Super-specialized to broadly useful in any vertical
“Traditional” to completely disruptive
Advocate of language leverage and strong factoring

Still programming – using emacs, of course

What Are Your Developers Doing All
Day?
Adding and testing business features
OR
“Integrating with other components, tools, and
systems”
•
•
•
•
•

Database(s)
ETL and other data transfer operations
Messaging
Services (web & other)
Other open source frameworks

Why Can’t We Just Save and Fetch
Data?
Because the way we think about data at the
business use case level…
…is different than the way it is implemented at
the application/code level…
…which traditionally is VERY different than the
way it is implemented at the database level

This Problem Isn’t New…
…but for the past 40 years, innovation at the business & application layers
has outpaced innovation at the database layer
1974

2014

Business
Data Goals

Capture my company’s
transactions daily at
5:30PM EST, add them up
on a nightly basis, and print
a big stack of paper

Capture my company’s global transactions in realtime
plus everything that is happening in the world
(customers, competitors, business/regulatory,weather),
producing any number of computed results, and passing
this all in realtime to predictive analytics with model
feedback; results in realtime to 10000s of mobile
devices, multiple GUIs, and b2b and b2c channels

Release
Schedule

Quarterly

Yesterday

Application
/Code

COBOL, Fortran, Algol,
PL/1, assembler,
proprietary tools

COBOL, Fortran, C, C++, VB, C#, Java, javascript,
groovy, ruby, perl python, Obj-C, SmallTalk, Clojure,
ActionScript, Flex, DSLs, spring, AOP, CORBA, ORM,
third party software ecosystem, open source movement

Database

I/VSAM, early RDBMS

Mature RDBMS, legacy I/VSAM
Column & key/value stores, and…mongoDB

Exactly How Does mongoDB Change
Things?
• mongoDB is designed from the ground up to
address rich structure (maps of maps of lists
of…), not rectangles
•
•

Standard RDBMS interfaces (i.e. JDBC) do not exploit features
of contemporary languages
Rapid Application Development (RAD) and scripting in
Javascript, Python, Perl, Ruby, and Scala is impedancematched to mongoDB

• In mongoDB, the data is the schema
• Shapes of data go in the same way they come
out

Rectangles are 1974. Maps and Lists are
2014
{

customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
type : “work”,
number: “1-800-555-1212”
},
{
type : “home”,
number: “1-800-555-1313”,
DNC: true
},
{
type : “home”,
number: “1-800-555-1414”,
DNC: true
}
]

}

An Actual Code Example (Finally!)
Let’s compare and contrast RDBMS/SQL to mongoDB
development using Java over the course of a few weeks.
Some ground rules:
1. Observe rules of Software Engineering 101: Assume separation of
application, Data Access Layer, and persistor implementation
2. Data Access Layer must be able to
a. Expose simple, functional, data-only interfaces to the application
•
No ORM, frameworks, compile-time bindings, special tools
b. Exploit high performance features of persistor
3. Focus on core data handling code and avoid distractions that require the same
amount of work in both technologies
a. No exception or error handling
b. Leave out DB connection and other setup resources
4. Day counts are a proxy for progress, not actual time to complete indicated task

The Task: Saving and Fetching Contact
data
Start with this
simple, flat shape in
the Data Access
Layer:
And assume we
save it in this way:
And assume we
fetch one by primary
key in this way:

Map m = new HashMap();
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
save(Map m)

Map m = fetch(String id)

Brace yourself…..

Day 1: Initial efforts for both technologies
SQL

mongoDB

DDL: create table contact ( … )

DDL:

none

init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}

save(Map
Let’s assume for argument’s sakem)that both
{
collection.insert(m);
approaches take the same amount of time
}

Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}

{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}

Day 2: Add simple fields
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new
business feature
• It was pretty easy to add two fields to the structure

• …but now we have to change our persistence code

Brace yourself (again) …..

SQL Day 2 (changes in bold)
DDL:

alter table contact add title varchar(8);
alter table contact add hireDate date;

init()
{
(“insert into contact ( id, name, title, hiredate ) values
( ?,?,?,? )”);
(“select id, name, title, hiredate from contact where id =
?”);
}
save(Map m)
{
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
}
{
Map m = null;
if(rs.next()) {
m = new HashMap();
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}

Consequences:
1. Code release schedule linked
to database upgrade (new
code cannot run on old
schema)

2. Issues with case sensitivity
starting to creep in (many
RDBMS are case insensitive
for column names, but code is
case sensitive)
3. Changes require careful mods
in 4 places
4. Beginning of technical debt

mongoDB Day 2
save(Map m)
{
}
{
Map m = null;
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}

✔ NO
CHANGE

Advantages:
1. Zero time and money spent on
overhead code
2. Code and database not physically
linked
3. New material with more fields can
be added into existing collections;
backfill is optional
4. Names of fields in database
precisely match key names in
code layer and directly match on
name, not indirectly via positional
offset
5. No technical debt is created

Day 3: Add list of phone numbers
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11,
1));
n1.put(“type”, “work”);
n1.put(“number”, “1-800-555-1212”));
list.add(n1);
n2.put(“type”, “home”));
n2.put(“number”, “1-866-444-3131”));
list.add(n2);
m.put(“phones”, list);
• It was still pretty easy to add this data to the structure
• .. but meanwhile, in the persistence code …

REALLY brace yourself…

SQL Day 3 changes: Option 1: Assume
just 1 work and 1 home phone number
DDL:

alter table contact add work_phone varchar(16);
alter table contact add home_phone varchar(16);

init()
{
(“insert into contact (
id, name, title, hiredate, work_phone, home_phone )
values ( ?,?,?,?,?,? )”);
(“select
id, name, title, hiredate, work_phone, home_phone from
contact where id = ?”);
}

save(Map m)
{
for(Map onePhone : m.get(“phones”)) {
String t = onePhone.get(“type”);
String n = onePhone.get(“number”);
if(t.equals(“work”)) {
contactInsertStmt.setString(5, n);
} else if(t.equals(“home”)) {
contactInsertStmt.setString(6, n);
}
}
}

{
Map m = null;
if(rs.next()) {
m = new HashMap();
Map onePhone;
onePhone = new HashMap();
onePhone.put(“type”, “work”);
onePhone.put(“number”, rs.getString(5));
list.add(onePhone);
onePhone = new HashMap();
onePhone.put(“type”, “home”);
list.add(onePhone);
}

This is just plain bad….

SQL Day 3 changes: Option 2:
Proper approach with multiple phone
numbers

DDL:

create table phones ( … )

init()
{
(“insert into contact ( id, name, title, hiredate )
values ( ?,?,?,? )”);
c2stmt = connection.prepareStatement(“insert into
phones (id, type, number) values (?, ?, ?)”;
(“select id, name, title, hiredate, type, number from
contact, phones where phones.id = contact.id and
contact.id = ?”);
}

save(Map m)
{
startTrans();
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
endTrans();
}

{
Map m = null;
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
list.add(onePhone);
i++;
}
return m;
}

This took time and money

SQL Day 5: Zombies!
init()
{
values ( ?,?,?,? )”);
(“select A.id, A.name, A.title, A.hiredate, B.type, B.number from
contact A left outer join phones B on (A.id = B. id) where A.id =
?”);
}

while (rs.next()) {
if(i == 0) {
// …
}
String s = rs.getString(5);
if(s != null) {
Map onePhone = new HashMap();
onePhone.put(“type”, s);
list.add(onePhone);
}
}

(zero or more between entities)

Whoops! And it’s also wrong!
We did not design the query accounting
for contacts that have no phone number.
Thus, we have to change the join to an
outer join.

But this ALSO means we have to change
the unwind logic

This took more time and
…but at least we have a DAL…
money!
right?

mongoDB Day 3
save(Map m)
{
}
{
Map m = null;
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}

✔ NO
CHANGE

Advantages:
overhead code
2. No need to fear fields that are
“naturally occurring” lists
containing data specific to the
parent structure and thus do not
benefit from normalization and
referential integrity

By Day 14, our structure looks like this:
m.put(“id”, “K1”);
//…

n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );
n4.put(“geo”, “US-EAST”);
list2.add(n4);
n4.put(“startupApps”, new String[] { “app6” } );
n4.put(“geo”, “EMEA”);l
n4.put(“useLocalNumberFormats”, false):
list2.add(n4);
m.put(“preferences”, list2)
n6.put(“optOut”, true);
n6.put(“assertDate”, someDate);
seclist.add(n6);
m.put(“attestations”, seclist)
m.put(“security”, anotherMapOfData);

• It was still pretty easy to add this data to the structure
• Want to guess what the SQL persistence code looks like?
• How about the mongoDB persistence code?

SQL Day 14
Error:

Could not fit all the code into this space.

…actually, I didn’t want to spend 2 hours putting the code together..
But very likely, among other things:
•

n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”});

was implemented as a single semi-colon delimited string
• m.put(“security”, anotherMapOfData);
was implemented by flattening it out and storing a subset of fields

mongoDB Day 14 – and every other day
save(Map m)
{
}
{
Map m = null;
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}

✔ NO
CHANGE

Advantages:
overhead code
2. Persistence is so easy and flexible
and backward compatible that the
persistor does not upwardinfluence the shapes we want to
persist i.e. the tail does not wag
the dog

But what about “real” queries?
• mongoDB query language is a physical map-ofmap based structure, not a String
• Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are
keys and values in a cascade of Maps
• No grammar to parse, no templates to fill in, no
whitespace, no escaping quotes, no parentheses, no
punctuation

• Same paradigm to manipulate data is used to
manipulate query expressions

• …which is also, by the way, the same paradigm
for working with mongoDB metadata and
explain()

mongoDB Query Examples
Objective

Code

CLI

Find all contacts with
at least one mobile
phone

Map expr = new HashMap();
expr.put(“phones.type”, “mobile”);

db.contact.find({"phones.type”:"mobile”});

Find contacts with NO
phones

Map q1 = new HashMap();
q1.put(“$exists”, false);
expr.put(“phones”, q1);

db.contact.find({"phones”:{"$exists”:false}});

Advantages:
List fetchGeneral(Map expr)
{
List l = new ArrayList();
DBObject dbo = new BasicDBObject(expr);
Cursor c = collection.find(dbo);
while (c.hasNext()) }
l.add((Map)c.next());
}
return l;
}

1. Far less time required to set up
complex parameterized filters

2. No need for SQL rewrite logic or
creating new PreparedStatements
3. Map-of-Maps query structure is easily
walked and processed without parsing

…and before you ask…
Yes, mongoDB query expressions
support
1. Sorting
2. Cursor size limit
3. Aggregation functions
4. Projection (asking for only parts of the rich
shape to be returned)

Day 30: RAD on mongoDB with Python
import pymongo
def save(data):
coll.insert(data)

Advantages:

def fetch(id):
return coll.find_one({”id": id } )

1. Far easier and faster to create
scripts due to “fidelity-parity” of
mongoDB map data and python
(and perl, ruby, and javascript)
structures

myData = {
“name”: “jane”,
“id”: “K2”,
# no title? No problem
“hireDate”: datetime.date(2011, 11, 1),
“phones”: [
{ "type": "work",
"number": "1-800-555-1212"
},
{ "type": "home",
"number": "1-866-444-3131"
}
]
}
save(myData)
print fetch(“K2”)

1. Data types and structure in scripts
are exactly the same as that read and
written in Java and C++

expr = { "$or": [ {"phones": { "$exists": False }}, {"name": ”jane"}]}
for c in coll.find(expr):
print [ k.upper() for k in sorted(c.keys()) ]

Day 30: Polymorphic RAD on mongoDB with
Python
import pymongo
item = fetch("K8")
# item is:
{
“name”: “bob”,
“id”: “K8”,
"personalData": {
"preferedAirports": [ "LGA", "JFK" ],
"travelTimeThreshold": { "value": 3,
"units": “HRS”}
}
}
item = fetch("K9")
# item is:
{
“name”: “steve”,
“id”: “K9”,
"personalData": {
"lastAccountVisited": {
"name": "mongoDB",
"when": datetime.date(2013,11,4)
},
"favoriteNumber": 3.14159
}
}

Advantages:
1. Scripting languages easily digest
shapes with common fields and
dissimilar fields

2. Easy to create an information
architecture where placeholder fields
like personalData are “known” in the
software logic to be dynamic

Day 30: (Not) RAD on top of SQL with
Python
init()
{
values ( ?,?,?,? )”);
(“select id, name, title, hiredate, type, number from
contact, phones where phones.id = contact.id and
contact.id = ?”);
}
save(Map m)
{
startTrans();
c2stmt.setString(1, onePhone.get(“type”));
c2stmt.setString(2, onePhone.get(“number”));
c2stmt.execute();
}
endTrans();
}

Consequences:
1. All logic coded in Java interface
layer (splitting up contact, phones,
preferences, etc.) needs to be
rewritten in python (unless Jython
is used) … AND/or perl, C++,
Scala, etc.
2. No robust way to handle
polymorphic data other than
BLOBing it
3. …and that will take real time and
money!

The Fundamental Change with mongoDB
RDBMS designed in era when:
• CPU and disk was slow &
expensive
• Memory was VERY expensive
• Network? What network?
• Languages had limited means to
dynamically reflect on their types
• Languages had poor support for
richly structured types

Thus, the database had to
• Act as combiner-coordinator of
simpler types
• Define a rigid schema
• (Together with the code) optimize
at compile-time, not run-time

In mongoDB, the
data is the schema!

mongoDB and the Rich Map Ecosystem
Generic comparison of two
records

expr.put("myKey", "K1");
DBObject a = collection.findOne(expr);
expr.put("myKey", "K2");
DBObject b = collection.findOne(expr);
List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b);

Getting default values for a thing
on a certain date and then
overlaying user preferences (like
for a calculation run)

expr.put("myKey", "DEFAULT");
expr.put("createDate", new Date(2013, 11,
1));
DBObject a = collection.findOne(expr);
expr.clear();
expr.put("myKey", "user1");
DBObject b = otherCollectionPerhaps.findOne(expr);
MapStack s = new MapStack();
s.push((Map)a);
s.push((Map)b);
Map merged = s.project();

Runtime reflection of Maps and Lists enables generic powerful utilities
(MapDiff, MapStack) to be created once and used for all kinds of shapes,
saving time and money

Lastly: A CLI with teeth
> db.contact.find({"SeqNum": {"$gt”:10000}}).explain();
{
"cursor" : "BasicCursor",
"n" : 200000,
//...
"millis" : 223
}

Try a query and show the
diagnostics

> for(v=[],i=0;i<3;i++) {
… n = i*50000;
… expr = {"SeqNum": {"$gt”: n}};
… v.push( [n, db.contact.find(expr).explain().millis)] }

Run it 3 times with smaller and
smaller chunks and create a
vector of timing result pairs
(size,time)

>v
[ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ]

Let’s see that vector

> load(“jStat.js”)
> jStat.stdev(v.map(function(p){return p[1];}))
2.0548046676563256

Use any other javascript you
want inside the shell

> for(i=0;i<3;i++) {
… expr = {"SeqNum": {"$gt":i*1000}};
… db.foo.insert(db.contact.find(expr).explain()); }

Party trick: save the explain()
output back into a collection!

#MongoDB

Thank You
Buzz Moschetti
buzz.moschetti@mongodb.com
Solutions Architect, MongoDB

Webinar: Dramatically Reducing Development Time With MongoDB

Recomendados

Recomendados

Más contenido relacionado

Más de MongoDB

Más de MongoDB (20)

Último

Último (20)

Webinar: Dramatically Reducing Development Time With MongoDB

Notas del editor