Using Open Source and Cloud Computing principles, these slides walk through the architectural patterns for building scalable cloud services. The second part of the presentation focuses on profiling common geolocation tasks like importing large datasets and rendering map tiles.
08448380779 Call Girls In Civil Lines Women Seeking Men
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure
1. FOSS4G
in
the
Cloud
Mohamed
Sayed
mohamed@fossworx.org
Version
092013
License:
CC-‐BY-‐SA
2. Agenda
• Disclaimers
• Goals/MoLves
• The
historical
path
to
‘Cloud
CompuLng’
• ‘DefiniLon’
of
cloud
compuLng
• FOSS4G
in
Cloud
Use
cases
• AWS:
Components
and
Services
• Building
for
the
cloud
– Architectural
paUerns
for
Cloud
Services
– Cultural
changes
– Processes
changes
– Things
to
remember
• Common
FOSS4G
tasks
in
AWS
– ImporLng
OSM
data
into
POSTGIS
– Mod_Lle/Mapnik
– GWC/Geoserver
• QuesLons?
3. Disclaimers
• The
work
presented
was
funded
personally
and
done
during
my
vacaLon.
All
opinions
are
my
own
and
not
my
employer.
• I
am
not
affiliated
with
AWS
in
any
other
way
than
being
a
customer,
I
choose
them
when
that
choice
makes
sense
and
would
use
others
where
applicable.
• This
is
sLll
Work
in
progress.
YMMV
4. Goals/MoLves
• Goals
– We
will
learn
or
validate
some
ideas.
– Get
some
feedback
on
what
to
do
next.
– Help
save
someone
Lme/money/frustraLon
– Raise
awareness
about
some
risks.
• MoLves
– The
new
disrupLon
is
in
data
and
services
around
it,
we(Open
Source
people)
should
not
miss
out
on
that
and
I
believe
I
can
help.
5. Cloud Computing
Hardware Changes
Virtualization Mobile Computing
Path to Cloud Computing
MultiScreen
Tablets
KVM/
Xen
Solaris
Zones
VMWare/
Parallels
Storage/Network
Virtualization
I/O Offloading
NPT/EPT
Multicore Support
Smart
Phones
6. Cloud
CompuLng
definiLon
(IMHO)
• Cloud
compuLng
is
a
compuLng
paradigm
composed
of
abstracLons
,
a
set
of
primiLves
and
a
set
of
interfaces
and
tools
to
drive
those
abstracLons
and
primiLves.
The
abstracLons
and
primiLves
need
not
be
new
in
themselves,
but
their
combinaLon
and
impact
is
what
create
‘The
Cloud’
culture.
11. FOSS4G
Use
Cases
• Disaster
Recover/Backup
• StaLc,
Logic-‐free,
web
publishing
• Online
FOSS4G
as
a
Service
• Data
transformaLon
jobs
• Content
CuraLon
and
Batch
processes
18. The
Centrist
• Pros:
– Scales
at
components
level.
– Moderate
complexity
up
to
middle
range
load.
– Faster/Easier
fault
isolaLon/detecLon.
– Data
stores
Master/Slave
is
a
well
studied
concept.
• Cons:
– Central
data
store
becomes
more
criLcal/boUleneck.
– MulL-‐region
deployments
suffer
from
latency.
– VerLcal
scaling
characterisLcs
pronounced
on
the
Data
store.
22. CAP
–
Master
of
Colonies
• Pros:
– Improved
write
performance.
– Decompose
large
data
sets
into
smaller
ones.
– Faster
data
iteraLons.
– Good
disaster
recovery
strategy.
• Cons:
– Complex!
– Weak/Varying
support
by
various
data
stores.
– High
maintenance
overhead
23. Cultural
Changes
• Get
stakeholders
buy-‐in
early.
• Build
a
full
ownership
culture.
• Adopt
an
agile
approach.
• Encourage
prototyping
and
experimentaLon.
• AutomaLon
as
a
way
of
life.
24. Processes
Changes
• Somware
Architecture:
– Know
the
floor,
and
the
ceiling.
– Be
as
stateless
as
possible.
– Graceful
failure
response.
– Good
Logging
as
a
way
of
life.
• Release
Engineering
– The
VM
as
an
arLfact
– AutomaLon
– Versioning
– Snapshot
• AutomaLon:
– ConfiguraLon
management
– OrchestraLon
– Auto-‐scaling
25. Things
to
remember
• Review
any
legal
implicaLons.
• Use
the
cloud
primiLves.
• Pay
aUenLon
to
security:
Security
groups,
Encrypted
data
at
rest,
etc.
• Cleanup
old
stuff.
• Things
fail:
don’t
fight
it,
just
handle
it.
• You
will
not
get
it
right
the
first
Lme
but
things
should
look
good
on
3rd
iteraLon.(Read
the
mythical
man
month)
26. FOSS4G
in
AWS
Performance/Architecture
EvaluaLon
• Tools
used:
– Siege
– Sar
– Oprofile
– R/AWK/Python/Ruby
• Postgresql
queries
log.
• Test
client
-‐>
Target
server
as
separate
nodes.
27. OSM
Data
into
AWS
• Setup
1
– M1.Large
(
2
Cores)
– Standard
EBS
– EU-‐West
region
• Setup
2
– M1.Large
– Provisioned
EBS
:
8000
IOPS
– EU-‐West
region
• Setup
3
– Hi.4xlarge
– SSD
drive
– EU-‐West
region
• Setup
4
– M2.2xlarge
– EU-‐West
– Ephemeral
drives
30. Enough
Water
TesLng
ImporLng
Planet
to
SSD
• Guess
how
long
it
took
to
finish
31. ImporLng
Planet
into
AWS
Using
SSD
• It
only
took
35
hours!
• Disk
uLlizaLon:
~250Gb
• Guess
what
was
the
first
thing
I
did
when
it
finished?
32. ImporLng
Planet
into
AWS
• I
made
a
copy
of
course
J
• Create
a
RAID
0
set
• Create
LVM
on
top
of
RAID
0
• Kick
off
data
copy
• Guess
how
long
it
took
36. Data
import
notes
• Create
the
DB
on
SSD
and
clone
to
EBS:
– Use
case:
quickly
import
the
data
but
make
it
persistent.
– Full
planet
volume
takes
2-‐2.5
hours.
• Create
Provisioned
EBS
and
clone
to
SSD:
– Use
case:
Need
very
fast
runLme
access
– Full
planet
volume
takes
5.4
hours
• Can
we
get
OSM
primiLves
summary
per
dump
and
full
planet
as
part
of
the
pbf?
37. Data
Import
in
AWS
Lessons
learned
• It
is
not
only
the
disk.
• Risk
on
mulLple
levels
– Dev
teams
can’t
possibly
be
tesLng
to
their
full
potenLal(in
the
data
context).
– Evident
in
outdated/incorrect
documentaLon
for
bootstraping
38. Rendering
–
ModLle/mapnik
• Apache
module
+
a
unix
daemon.
• Apache
module
is
process
model,
Renderd
is
mulLthreaded.
• Apache
module
sends
a
command
to
renderd
over
a
unix
socket.
• The
renderer
will
fetch
the
data
and
writes
it
out.
• Non
cached
data
will:
– Fail
on
first
aUempt(return
404)
– Pass
on
second
aUempt(~600
msec)
• Cached
data
is
served
<
10
msec
• Very
SQL
chaUery
44. Rendering
–
GeoServer/GWC
• Single
layer,
ZL
15,
RAM
Disk
:
100
Lles/sec
• TruncaLon
is
very
slow.
Please
version
your
published
layers.
• Standalone
GWC
offers
much
beUer
scalability
model
• Possible
race
condiLons
in
threads
wriLng
Lles.
• Didn’t
hit
the
getAlphaTile()
issue.
47. Released
arLfacts
Snapshots
of
OSM
data
in
flat
PGSQL
• 2
drives
:
– snap-‐f9affde6
– snap-‐ffaffde0
• To
use:
– Create
a
volume
based
on
the
snapshot
– Mdadm
acLvate
(
raid0
,
2
drives)
– Pvscan,vgscan,vgchange,lvscan
– Installing
mdadm
and
rebooLng
should
work
on
most
machines
to
do
this
for
you
automagically.
– Mount
on
the
volume
on
your
PGDATA
path
48. Backlog
• Geocoding
tesLng
with
Twofish
and
GISGraphy
• OSRM
profiling
• SuggesLons?
49. Many
thanks
to
• Geofabrik
for
compiling
all
those
sets/formats.
• FOSS4G2013
for
this
opportunity
• And
THANK
YOU