2. Introduction
l
MapMyFitness
was
founded
in
2007
l
Offices
in
Denver,
C O
&
AusRn,
T X
(w/
associates
in
S F,
Boston,
New
York,
L A,
and
Chicago)
l
Over
13
million
registered
users
l
~80
million
geo-‐data
routes
(runs,
rides,
walks,
hikes,
etc)
l
Core
sites,
mobile
apps,
A PI,
white-‐label
(MapMyRun,
MapMyRide,
MapMyFitness)
3. MMF Platform Overview
•
Python
(django)
&
P HP
(legacy
A PI)
•
Although
MySQL
is
the
core
backing
db
for
Django,
the
majority
of
M MF
data
lives
in
various
MongoDB
datastores.
•
Routes
datastore
has
~120
million
objects,
currently
7TB+
of
data
(3
member
replica
set
backed
by
a
EMC
SAN,
48GB
RAM
each)
•
Django
sessions
converted
to
using
MongoDB
(funcRonal
scaling
example,
600M
sessions
stored)
•
Live
Tracking
system
uRlizes
elasRc
replica
set
membership
to
handle
load
scaling
for
events
•
Granular
A PI
access/error
logging
via
json
to
MongoDB
5. Implementation Patterns
•
Standard
Datastore
-‐
3
member
replica
set
(small
to
med
implementaRons)
•
Big
Data
implementaRon
–
sharded
cluster
(TB+)
•
Buffering
Layer
-‐
high
memory
(load
all
data
and
index
files
into
R AM)
•
Write
Heavy
-‐
uRlize
sharding
to
opRmize
for
writes
•
Read
Heavy
-‐
3+n
replica
set
configuraRon
for
rapid
read
scaling
(up
to
12
nodes)
6. Implementation Patterns
•
In
the
cloud,
tune
the
instance
type
to
the
mongo
implementaRon
•
On
iron,
plan
carefully
and
dedicate
servers
completely
to
mongo
to
avoid
memory
map
contenRon
•
For
D R,
spin
up
a
delayed,
hidden
replica
node
(preferably
in
a
different
datacenter)
•
AggregaRon
framework
can
be
used
in
myriad
ways,
including
bridging
the
gap
to
S QL
data
warehousing
via
E TL.
•
Automate
install
paSerns
for
rapid
development,
prototyping,
and
infrastructure
scaling.
8. Replica Set Expansion
• MongoDB
is
“replicaRon
made
elegant”
• Ridiculously
simple
to
add
addiRonal
members
• Be
sure
to
run
IniRalSync
from
a
secondary!
rs.add(
“host”
:
“livetrack_db09”,
“ini8alSync”
:
{
“state”
:
2
}
)
• Both
rs.add()
and
rs.remove()
can
be
scripted
and
connected
to
Monitoring
systems
for
autoscaling
9. Monitoring and Introspection
•
M MS,
10gen's
cloud-‐based
monitoring
service
(best
available)
•
Supported
by
Zabbix,
Nagios,
Munin,
Server
Density,
etc
•
mongostat,
mongotop,
R EST
interface,
database
profiler
•
Monitoring
system
triggers
can
iniRate
node
addiRons,
removals,
service
restarts,
etc
•
In
addiRon
to
service-‐level
monitoring,
use
more
advanced
tests
to
check
for
and
alert
on
query
latency
spikes
14. Security Considerations
•
MongoDB
provides
authenRcaRon
support
and
basic
permissions
•
Auth
is
turned
off
by
default
to
allow
for
opRmal
performance
•
Always
run
databases
in
a
trusted
network
environment
•
Lock
down
host
based
firewalls
to
limit
access
to
required
clients
•
Automate
iptables
with
puppet
or
chef,
in
EC2
use
security
groups
16. Security Considerations
•
Use
the
rule
of
least-‐privilege
to
allow
access
to
environments
•
Data
sensiRvity
should
determine
the
extent
of
security
measures
•
For
non-‐sensiRve
data,
good
network
security
can
be
sufficient
•
In
open
environments,
be
sure
experience
matches
access
level
•
Lack
of
granular
perms
allows
for
full
admin
access,
use
discreRon
17. Maintenance
•
Far
less
maintenance
required
than
tradiRonal
R DMBS
systems
•
Regularly
perform
query
profile
analysis
and
index
audiRng
•
Rebuild
databases
to
reclaim
space
lost
due
to
fragmentaRon
•
Automate
checks
of
log
files
for
known
red-‐flags
•
Regularly
review
data
throughput
rate,
storage
growth
rate,
and
overall
business
growth
graphs
to
inform
capacity
planning.
•
For
H A
tesRng,
periodically
step-‐down
the
primary
to
force
failover
18. Indexing Patterns or “Know Your App”
• Proper
indexing
criRcal
to
performance
at
scale
(monitor
slow
queries
to
catch
non-‐performant
requests)
• MongoDB
is
ulRmately
flexible,
being
schemaless
(mongo
gives
you
enough
rope
to
hang
yourself,
choose
wisely)
• Avoid
un-‐indexed
queries
at
all
costs
(it's
quickest
way
to
crater
your
app...
consider
-‐-‐notablescan)
• Onus
on
DevOps
to
match
applicaRon
to
indexes
(know
your
query
profile,
never
assume)
• Shoot
for
'covered
queries'
wherever
possible
(answer
can
be
obtained
from
indexes
only)
19. Capped Collections
• Use
standard
capped
collecRons
for
retaining
a
fixed
amount
of
data.
Uses
a
F IFO
strategy
for
pruning.
(based
on
data
size,
not
number
of
rows)
• TTL
CollecRons
(2.2)
age
out
data
based
on
a
retenRon
Rme
configuraRon.
(great
for
data
retenRon
requirements
of
all
types)
Gotcha!
Explicitly
create
the
capped
collecRon
before
any
data
is
put
into
the
system
to
avoid
auto-‐creaRon
of
collecRon
20. Lessons Learned
•
Mongo
2.2
upgrade
containing
a
capped
collecRon
created
in
1.8.4.
This
severely
impacted
replicaRon
(RC:
no
"_id"
index,
F IX:
add
"_id"
index)
•
Never
start
mongo
when
a
mount
point
is
missing
or
incorrectly
configured.
Mongo
may
decide
to
take
maSers
into
it's
own
hands
and
resync
itself
with
the
replica
set.
Make
sure
your
devops
and
your
hos2ng
provider
admins
are
aware
of
this
•
Some
drivers
that
use
connecRon
pooling
can
freak
the
freaky
freak
when
the
primary
member
changes
(older
pymongo).
Kicking
the
applicaRon
can
fix,
also:
upgrade
drivers
•
High
locked
%
is
a
big
red-‐flag,
and
can
be
caused
by
a
large
number
of
simultaneous
dml
acRons
(high
insert
rate,
high
update
rate).
Consider
this
in
the
design
phase.
•
Be
wary
of
automaRon
that
can
change
the
state
of
a
node
during
maintenance
mode.
Disable
automaRon
agents
for
reduced
risk
during
criRcal
administraRve
operaRons
(filesystem
maint,
etc)