Boost PC performance: How more available memory can improve productivity
Big Data Science in the Cloud from Big Data World Conference 2013
1. „Big Data Science in the Cloud“
Markus Schmidberger
Big Data Analyst & Cloud Engineer
@cloudHPC
markus@mongosoup.de
2. Big Data gets Political
●
New coalition agreement in Germany:
–
“Wir wollen die Informations- und KommunikationsStrategie (IKT-Strategie) für die digitale Wirtschaft
weiterentwickeln. ...
–
... Wir werden die Forschungs- und Innovationsförderung
für „Big Data“ auf die Entwicklung von Methoden und
Werkzeugen zur Datenanalyse ausrichten ... “
3. “We change the rules!”
Curios, playful, agile, experienced, goal-oriented, love to
detail, thinking differently ...
Continuos Software delivery
Big data &
polyglot persistence
3. December 2013 - 3
Lean & agile
6. Big Data Science
●
Data science seeks to use all available and
relevant data to effectively tell a story that can
be easily understood by non-practitioners.
3. December 2013 - 6
7. Cloud Computing
●
Wikipedia: “... describes a variety of
computing concepts that involve a large
number of computers connected through
a real-time communication network such
as the Internet. ...”
3. December 2013 - 7
8. 1) Put Apps & Data to best Place
3. December 2013 - 8
9. AWS Zones at the right Place
3. December 2013 - 9
10. Example: R and RStudio Server
●
R: open-source
statistical Software
–
●
www.r-project.org
RStudio IDE
–
–
www.rstudio.org
IDE + web / server
version
3. December 2013 - 10
11. 2) Choose Cloud Resources carefully
●
●
●
Instance type
EBS optimized
EBS provisioned
IOPS
●
Load Balancer
●
Availability Zones
http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
3. December 2013 - 11
12. MongoSoup is the first German-based MongoDB cloud
hosting solution!
Supported by a team of experts from MongoDB Inc.
first German partner comSysto. You can have a running
MongoDB database in virtually no time.
●
MongoDB hosting on Amazon EC2 (eu-west-1) and in Munich
●
24x7 monitoring and support
●
Dedicated instances and shared hosting available
●
Replica Sets and Sharding available
●
SSL-enabled MongoDB
3. December 2013 - 12
13. Performance <-> Costs
●
scale up & out
●
scale down ?
●
monitor your resources
from the beginning
3. December 2013 - 13
14. 3) Use full Cloud Technology Stack
3. December 2013 - 14
15. Example: AWS EMR with mapR
●
Speed
●
Compression
–
●
reduces disk and
network I/O and
increases
performance
Snapshots
–
data protection
3. December 2013 - 15
16. 4) Data Protection
●
●
talk to the experts
(e.g. Bitkom)
use available
mechanisms &
services
–
–
●
EMR in VPC
Mongosoup.de
be aware of the topic
3. December 2013 - 16
17. More Big Data Events
●
“Map-Reducing
Everywhere”
–
●
https://hadoopsummit.uservoice.co
m
Forum Big Data und
Verantwortung u.a. mit
Frank Schirrmacher
–
3. December 2013 - 17
Di, 03.12. 19:00; Große Aula LMU
18. „Big Data Science in the Cloud“
- Yes We Can @cloudHPC
markus@mongosoup.de
http://comsysto.com/events
3. December 2013 - 18