SlideShare una empresa de Scribd logo
1 de 24
Cloudgene - an execution platform for
MapReduce programs in public and
private clouds

Lukas Forer, Sebastian Schönherr, Hansi Weißensteiner

University of Innsbruck, Austria
Medical University Innsbruck, Austria


                                              BOSC 2012
MapReduce
                                                                          cluster


    Serial approach              Parallel approach
                                                                                    cloud

                                                                             private        public
       How to support scientists when using (our) MapReduce
       programs?
           Simplify the execution of MapReduce programs including
           data management
           Simplify access to a working MapReduce cluster
           Maintain data sensitivity




2
                      MapReduce: Simplified Data Processing on Large Clusters - Dean & Ghemawat - 2004
MapReduce in Genetics
    CloudBurst
           highly sensitive read mapping with MapReduce; Schatz, 2009
    Crossbow
           Searching for SNPs with cloud computing; Langmead et al., 2009
    MyRNA
           Cloud-scale RNA-sequencing differential expression analysis with Myrna; Langmead et al.,
           2010
    Seal
           a Distributed Short Read Mapping and Duplicate Removal Tool; Pireddu et al., 2012
    Hadoop BAM
           directly manipulating next generation sequencing data in the cloud; Matti Niemenmaa et al.,
           2012
    CloudBioLinux
           CloudBioLinux: pre-configured and on-demand bioinformatics computing for the
           genomics community; Krampis et al., 2012

3
Difficulties with MapReduce


                    Additional steps, when setting up a
                    cluster in a public environment




                    Required steps when cluster is up and
                    running, Hadoop installed




4
Approaches
    Possible approaches
      Program specific approach
         Implement a GUI for every program
         Redundant work for the developer
         Heterogeneity

      Workflow systems
         Galaxy, Taverna, Mobyle
         Possible, but no HDFS support, blackbox

    Our approach for Hadoop MapReduce
         One GUI for different programs
         Feedback, Standardized Import/Export
         Integration of programs via a plugin interface

5
What is Cloudgene?
    Open-source platform to improve the usability of Hadoop
    MapReduce jobs
       Provides a graphical web interface for their execution
       Programs can be integrated by writing a simple configuration file
       Public cloud & private cloud
          Setting up a cluster in the cloud, installs all data on it
       History of executed jobs with defined input/output parameters


    Runs in your browser
                                           Myrna
                                         CloudBurst
                                             Seal
                                         Crossbow
                                         CloudBioLinux

                                         Cloudgene
6
Cloudgene




7
Features
    Integration of programs easily possible
       standard MapReduce programs (Java -> CloudBurst)
       streaming jobs (e.g. Mapper and Reducer using Perl-> Myrna)
       command line programs (e.g. using Pydoop -> Seal)


    Data can be imported from different sources
       S3 / HTTP / FTP
       Import of huge datasets
       Export results to S3 (public cloud)


    Connect different MapReduce programs to a pipeline
    Install additional programs via a web repository
8
Features

    Cloudgene can be used on private and public clusters


       sensitive data
       local data
                             } private cloud

       data on S3
       no in-house cluster
                             } public cloud
       available


    Open source


9
Summary




10
Cloudgene in Action




     How to integrate a new program in Cloudgene
       1. Implement the program (or use existing)
       2. Write plugin configuration file




11
Cloudgene in Action



     Step 1 - Implement a program, executable via the command line


     e.g: FastQ pre-processing with MapReduce
          base quality / sequence quality / duplication levels / length distribution


          hadoop jar exomePreprocessing.jar -input exomeData
          -step baseJob -encoding 0 -output resultsOutput




12
Cloudgene in Action



     Step 2 - Write configuration file including 3 parts


     Part 1 – General information:




13
Cloudgene in Action



     Step 2 - Write configuration file including 3 parts


     Part 2 – Public cloud information:




14
Cloudgene in Action



     Step 2 - Write configuration file including 3 parts


     Part 3 – MapReduce information:




15
Cloudgene in Action




16
Cloudgene in Action




17
Cloudgene in Action




18
Cloudgene in Action




19
Cloudgene in Action

     Different application – different GUI




20
Technologies
     Apache Hadoop
          http://hadoop.apache.org
     Apache Whirr
          http://whirr.apache.org
     Restlet
          http://www.restlet.org
     ExtJS
          http://www.sencha.com
     H2
          http://www.h2database.com



21
Evaluation

                                              4000 sec


     Amazon Elastic MapReduce (EMR)           3500 sec

                                              3000 sec
       Graphical execution for MapReduce
       programs                               2500 sec
                                                                                  Export
       Excellent solution for public clouds   2000 sec                            Calculation
                                                                                  Import
           Combination with S3                1500 sec
                                                                                  Setup
     but                                      1000 sec

           data sensitivity                    500 sec
           Reproducibility
                                                 0 sec
           Additional costs                              Cloudgene   Amazon EMR




22
Integrated programs


 Wordcount, Grep, etc.




                    http://sourceforge.net/apps/medihouse
                                                     in
                    awiki/cloudburst-
                    bio/nfs/project/c/cl/cloudburst-
                                Exome Preprocessing
                    bio/7/70/MediaWikiSidebarLogo
                    .png        Finding SNPs
23
Acknowledgements



                                                                      Project-Website:
Sebastian Schönherr       Lukas Forer         Hansi Weissensteiner    http://cloudgene.uibk.ac.at

                                                                      Source Code:
                                                                      http://github.com/genepi


                                                                     Thanks to the Open Source
Anita Kloss-Brandstätter Florian Kronenberg     Günther Specht       Community




24

Más contenido relacionado

Similar a L Forer - Cloudgene: an execution platform for MapReduce programs in public and private clouds

ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud Testbed
Jazz Yao-Tsung Wang
 
Access security on cloud computing implemented in hadoop system
Access security on cloud computing implemented in hadoop systemAccess security on cloud computing implemented in hadoop system
Access security on cloud computing implemented in hadoop system
João Gabriel Lima
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
Edge AI and Vision Alliance
 

Similar a L Forer - Cloudgene: an execution platform for MapReduce programs in public and private clouds (20)

Delivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the CloudDelivering Bioinformatics MapReduce Applications in the Cloud
Delivering Bioinformatics MapReduce Applications in the Cloud
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabrics
 
cncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetescncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetes
 
FinalReport
FinalReportFinalReport
FinalReport
 
Towards CloudML, a Model-Based Approach to Provision Resources in the Clouds
Towards CloudML, a Model-Based Approach  to Provision Resources in the CloudsTowards CloudML, a Model-Based Approach  to Provision Resources in the Clouds
Towards CloudML, a Model-Based Approach to Provision Resources in the Clouds
 
Paper444012-4014
Paper444012-4014Paper444012-4014
Paper444012-4014
 
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
 
ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud Testbed
 
D017212027
D017212027D017212027
D017212027
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
Access security on cloud computing implemented in hadoop system
Access security on cloud computing implemented in hadoop systemAccess security on cloud computing implemented in hadoop system
Access security on cloud computing implemented in hadoop system
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep Dive
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge ComputingFIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
FIWARE Global Summit - FogFlow, a new GE for IoT Edge Computing
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
 
Deep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the CloudDeep Learning Neural Networks in the Cloud
Deep Learning Neural Networks in the Cloud
 

Más de Jan Aerts

Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Jan Aerts
 

Más de Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Último

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

L Forer - Cloudgene: an execution platform for MapReduce programs in public and private clouds

  • 1. Cloudgene - an execution platform for MapReduce programs in public and private clouds Lukas Forer, Sebastian Schönherr, Hansi Weißensteiner University of Innsbruck, Austria Medical University Innsbruck, Austria BOSC 2012
  • 2. MapReduce cluster Serial approach Parallel approach cloud private public How to support scientists when using (our) MapReduce programs? Simplify the execution of MapReduce programs including data management Simplify access to a working MapReduce cluster Maintain data sensitivity 2 MapReduce: Simplified Data Processing on Large Clusters - Dean & Ghemawat - 2004
  • 3. MapReduce in Genetics CloudBurst highly sensitive read mapping with MapReduce; Schatz, 2009 Crossbow Searching for SNPs with cloud computing; Langmead et al., 2009 MyRNA Cloud-scale RNA-sequencing differential expression analysis with Myrna; Langmead et al., 2010 Seal a Distributed Short Read Mapping and Duplicate Removal Tool; Pireddu et al., 2012 Hadoop BAM directly manipulating next generation sequencing data in the cloud; Matti Niemenmaa et al., 2012 CloudBioLinux CloudBioLinux: pre-configured and on-demand bioinformatics computing for the genomics community; Krampis et al., 2012 3
  • 4. Difficulties with MapReduce Additional steps, when setting up a cluster in a public environment Required steps when cluster is up and running, Hadoop installed 4
  • 5. Approaches Possible approaches Program specific approach Implement a GUI for every program Redundant work for the developer Heterogeneity Workflow systems Galaxy, Taverna, Mobyle Possible, but no HDFS support, blackbox Our approach for Hadoop MapReduce One GUI for different programs Feedback, Standardized Import/Export Integration of programs via a plugin interface 5
  • 6. What is Cloudgene? Open-source platform to improve the usability of Hadoop MapReduce jobs Provides a graphical web interface for their execution Programs can be integrated by writing a simple configuration file Public cloud & private cloud Setting up a cluster in the cloud, installs all data on it History of executed jobs with defined input/output parameters Runs in your browser Myrna CloudBurst Seal Crossbow CloudBioLinux Cloudgene 6
  • 8. Features Integration of programs easily possible standard MapReduce programs (Java -> CloudBurst) streaming jobs (e.g. Mapper and Reducer using Perl-> Myrna) command line programs (e.g. using Pydoop -> Seal) Data can be imported from different sources S3 / HTTP / FTP Import of huge datasets Export results to S3 (public cloud) Connect different MapReduce programs to a pipeline Install additional programs via a web repository 8
  • 9. Features Cloudgene can be used on private and public clusters sensitive data local data } private cloud data on S3 no in-house cluster } public cloud available Open source 9
  • 11. Cloudgene in Action How to integrate a new program in Cloudgene 1. Implement the program (or use existing) 2. Write plugin configuration file 11
  • 12. Cloudgene in Action Step 1 - Implement a program, executable via the command line e.g: FastQ pre-processing with MapReduce base quality / sequence quality / duplication levels / length distribution hadoop jar exomePreprocessing.jar -input exomeData -step baseJob -encoding 0 -output resultsOutput 12
  • 13. Cloudgene in Action Step 2 - Write configuration file including 3 parts Part 1 – General information: 13
  • 14. Cloudgene in Action Step 2 - Write configuration file including 3 parts Part 2 – Public cloud information: 14
  • 15. Cloudgene in Action Step 2 - Write configuration file including 3 parts Part 3 – MapReduce information: 15
  • 20. Cloudgene in Action Different application – different GUI 20
  • 21. Technologies Apache Hadoop http://hadoop.apache.org Apache Whirr http://whirr.apache.org Restlet http://www.restlet.org ExtJS http://www.sencha.com H2 http://www.h2database.com 21
  • 22. Evaluation 4000 sec Amazon Elastic MapReduce (EMR) 3500 sec 3000 sec Graphical execution for MapReduce programs 2500 sec Export Excellent solution for public clouds 2000 sec Calculation Import Combination with S3 1500 sec Setup but 1000 sec data sensitivity 500 sec Reproducibility 0 sec Additional costs Cloudgene Amazon EMR 22
  • 23. Integrated programs Wordcount, Grep, etc. http://sourceforge.net/apps/medihouse in awiki/cloudburst- bio/nfs/project/c/cl/cloudburst- Exome Preprocessing bio/7/70/MediaWikiSidebarLogo .png Finding SNPs 23
  • 24. Acknowledgements Project-Website: Sebastian Schönherr Lukas Forer Hansi Weissensteiner http://cloudgene.uibk.ac.at Source Code: http://github.com/genepi Thanks to the Open Source Anita Kloss-Brandstätter Florian Kronenberg Günther Specht Community 24