SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
GALAXY
       BIOINFORMATICS WORKFLOW
                    ENVIRONMENT

Rutger Vos, 3 April 2012
Overview
Informatics in the post-genomic era
The past (?)

      Analyses glued together using
       scripting languages, directly on
       the CLI or in GUI
      Sanger sequencing
      Smaller data volumes
      Fewer remote data resources
      Hypothesis-driven
The present

    Graphical or text-
     based workflow tools
    “Next generation”
     sequencing
    Large data sets
    Many remote data
     resources
    “Data-driven”
NGS – Roche 454 pyrosequencing

Pyrosequencing                     Genome Sequencer FLX

    “Emulsion PCR”
    Bead with primer in each
     droplet
    Each bead is placed in a
     well with luciferase
    Plate is analyzed by fiber-
     optic chip
NGS – Illumina/Solexa

Reversible dye-terminator seq    HiSeq2000 (BGI)

    DNA attaches to primer on
     slide and is amplified
    4 RT-bases are added
    Camera detects labeled
     nucleotides
    Next 1-base cycle
NGS – IonTorrent

Ion semiconductor sequencing     Ion Torrent PGM

    “sequencing by synthesis”
    Not light-based, sensor
     detects H+ ions during
     synthesis
    Longer reads
NGS – SOLiD Sequencing

Oligonucleotide ligation and
                                    SOLiD 5500 Genetic Analyzer
detection
    Beads with DNA fragments
    Universal adapter attached
     to fragments
    PCR product attaches to
     slide
    Fluorescent probes ligate to
     the primer
The future


     “Next-next generation”
      sequencing (MinION?)
     Smaller data sets?
     Semantic web?
     Back to hypotheses?
Workflows
Tools for automating bioinformatics analyses
Examples of workflow tools


 Taverna        taverna.org.uk

 Galaxy         usegalaxy.org

 eHive          ensembl.org/info/docs/eHive

 Mobyle         mobyle.pasteur.fr

 Cipres         phylo.org

 Yahoo! Pipes   pipes.yahoo.com

 “make”         gnu.org/software/make
Galaxy
Provenance, histories

    Where do the
     data come from?
    How were they
     altered?
    Galaxy tracks the
     history of data.
    Histories can be
     converted to
     workflows
Creating a workflow
Workflow editor
Reproducibility

  Good science requires that results be reproducible
  Some analyses are run many times
Sharing

    “Standing on the
     shoulders of giants”
    Not re-inventing the
     wheel
    “Executable
     papers” (doi:
     10.1101/gr.
     094508.109)
Data
What data types, expressed in which formats,
does Galaxy operate on? How do I get data
into and out of Galaxy?
Data types
•    Sequences
•    Alignments
•    Intervals
•    Tabular data
File format conversion


    Built-in converters
     between related
     file formats are
     provided
    Additional
     converters can be
     added
Galaxy sequence formats

     FASTA – def line, sequence
     FASTQ – FASTA + quality - SOLEXA:
     ABI/SCF – binary sequence trace (see below)
     SFF (454) – binary flowgram format
Galaxy alignment formats

      MAF – multiple alignment format (see below)
      (S|B)AM – text or binary format for reads and ref
      AXT – pairwise alignment, from LAV
      LAV – BLASTZ output, pairwise alignment
Galaxy interval/feature formats

    BED
         chrom, start, end, …
    INTERVAL
         BED with headers
    GFF (GFF3)
         like BED and
          interval, but 1-based
          inclusive
    WIG(GLE)
         dense, continuous-
          valued tracks
Other data formats


 In addition to interval/feature tabular data as listed
    previously, other files with similar properties can be
    processed by some tools:
        *.txt tab-separated values (e.g. tabular FASTA)
        HTML (for additional prose)
        LPED/PBED (to describe SNPs, really two files, one for
         coordinates, other for alleles)
Data I/O
•    Upload via FTP
•    Fetch by URL
•    Import from data library
•    Provided by interoperable web service
Upload data using FTP
Fetch data from a URL
Import from data library
Fetch data from proxy service

    User submits proxy request to Galaxy
    Galaxy forwards request to remote service
    Service returns data
    Galaxy infers data type and presents results
“Big Data”


     NGS has led to massive
      data sets
     Data formats are simple,
      binary, and/or compressed
     Still, people drive around
      with USB hard disks
Data sharing/publishing


      The Galaxy platform
       allows users to
       publish and share
       their data, for
       example as
       supplemental
       materials to a
       publication*


         * example: http://genome.cshlp.org/content/19/11/2144
Tools
Which operations can I run on Galaxy?
Galaxy tools


  Send and Get data – Upload, fetch, send, submit
  Data manipulation – Join, sort, filter

  Format conversion – FASTA, other format operations

  Statistics – Regressions, simulations, model tests

  NGS – BAM, FASTQ, SOLiD, 454 file operations

  RNA analysis – cufflinks, tophat

  Evolution – branch lengths, NJ, HyPhy
Visualization
What data can I view in Galaxy?
Galaxy visualization - histograms
Galaxy visualization - scatterplots
Galaxy visualization – XY plot
Galaxy visualization – box plot
Galaxy visualization - trackster
Deployment
How to deploy your analyses on Galaxy, on
public servers, in the cloud or locally
Public servers
•    “Main” at UseGalaxy.org
•    NBIC
•    Others listed on Galaxy wiki
Local server

Pro:                          Con:

    Data close by                Complicated install
    Can add your own tools       Many dependencies
    Can develop Galaxy           UNIX-based
     further
    UNIX-based
Cloud Galaxy

  Galaxy can be installed in the (Amazon EC2 cloud)
  Private data without the hardware hassle

  Uploading and storing data can be costly, however
Implementation
How does it work under the hood?
Galaxy under the hood


             Issues
           command




 1. Parses HTTP request
 2. Identifies which tool to use
 3. Reads tool description
 4. Queues tool
 5. Parses result
 6. Returns HTML representation of result
Web server


    Simple Galaxy installs use
     a built-in, python-based
     HTTP server
    More robust installs
     typically use the Apache
     httpd server
Code base


      Most of the framework
       and the wrapper code is
       written in python
      Some wrappers in other
       languages, e.g. perl
Interface language


    Under the hood, Galaxy executes command-line programs and
     scripts
    Their interfaces and tool tips are described in XML files
Queuing

              Jobs are executed
               asynchronously
              Progress is shown in the
               data browser
              On big servers (e.g.
               “Main”), queuing is
               managed by the
               dedicated “Torque”
               system
UNIX


    Galaxy (simply)executes
     command-line programs
     within a UNIX-like
     environment
    Galaxy doesn’t have to
     “know” how to run those
     programs, it finds out from
     their descriptions at runtime
Database



      Analysis metadata is stored
       in a database, by default
       this is SQLite
      More robust installs use
       PostgreSQL
Configuration


     Galaxy has many moving
      parts that can be
      configured
     Configuration is done using
      simple text-based INI files
Version control


    Revision control (or version
     control) provides unlimited
     undo and detailed tracking
     of changes
    Galaxy uses Mercurial
    Popular now are svn, git
     and hg
Community
How to get in touch with the world-wide
community to get the most out of Galaxy?
Wiki
Mailing lists



      @lists.bx.psu.edu:
           galaxy-user
           galaxy-dev
           galaxy-announce
           galaxy-commits
Tutorials


     Galaxy 101:
          Getting data from UCSC
          Performing simple data
           manipulation
          Understanding Galaxy's
           History system
          Creating and editing
           workflows
          Applying workflows to
           your data
Screencasts
Events

      Galaxy Community
       Conference
      ISMB
      ECCB
      BOSC
      PAG
      Bio-IT World
      GMOD Meetings
“Tool shed”



                  Easy sharing of
                   new tools
                  Based on Mercurial
                  Turns Galaxy into a
                   modular ecosystem
CiteULike group




         Social citation manager
         There is a Galaxy group:
              citeulike.org/group/16008

           Articles are tagged by:
              PROJECT,
                     ISGALAXY, SHARED,
              HOWTO, METHODS,
              REPRODUCIBILITY, WORKFLOW
Organizations


     Development:
          Penn State University
          Emory University
     Support:
          NSF
          NHGRI
     Power user:
          NBIC
Links

    UseGalaxy.org – the Main server
    GetGalaxy.org – for local installs
    UseGalaxy.org/galaxy101 – intro tutorial
    Galaxy.nbic.nl – Sombrero, the NBIC server
    CiteULike.org/group/16008 – references
    genome.cshlp.org/content/19/11/2144 – windshield paper
    SlideShare.net/rvosa – these slides

Más contenido relacionado

La actualidad más candente (20)

Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Autodock review ppt
Autodock review pptAutodock review ppt
Autodock review ppt
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Biological networks
Biological networksBiological networks
Biological networks
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
BTIS
BTISBTIS
BTIS
 
Fasta
FastaFasta
Fasta
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
second generation of DNA Sequencing
second generation of DNA Sequencingsecond generation of DNA Sequencing
second generation of DNA Sequencing
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Rna seq
Rna seqRna seq
Rna seq
 
Sequence Assembly
Sequence AssemblySequence Assembly
Sequence Assembly
 
Clustal
ClustalClustal
Clustal
 
Similarity
SimilaritySimilarity
Similarity
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Prosite
PrositeProsite
Prosite
 

Destacado

Galaxy presentation
Galaxy presentationGalaxy presentation
Galaxy presentationBabubij
 
Geothermal energy
Geothermal energyGeothermal energy
Geothermal energyefabra
 
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsHuman/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsNC Military Business Center
 
Development and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelDevelopment and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelThermo Fisher Scientific
 
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Thermo Fisher Scientific
 
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemFully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemThermo Fisher Scientific
 
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...QIAGEN
 
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Thermo Fisher Scientific
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingIntegrated DNA Technologies
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Integrated DNA Technologies
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Integrated DNA Technologies
 
Sun, Moon and Planets Slideshow
Sun, Moon and Planets SlideshowSun, Moon and Planets Slideshow
Sun, Moon and Planets Slideshowsanstanton
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURESbca2010
 
Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Scissor Lee
 
Special quadrilaterals proofs ans constructions
Special quadrilaterals  proofs ans constructions Special quadrilaterals  proofs ans constructions
Special quadrilaterals proofs ans constructions cristufer
 

Destacado (20)

Galaxy presentation
Galaxy presentationGalaxy presentation
Galaxy presentation
 
Earth Moon And Sun
Earth Moon And SunEarth Moon And Sun
Earth Moon And Sun
 
Geothermal energy
Geothermal energyGeothermal energy
Geothermal energy
 
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced AnalyticsHuman/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
Human/Social Sciences/Cultural & Behavioral Dynamics and Advanced Analytics
 
Example site map
Example site mapExample site map
Example site map
 
Development and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 PanelDevelopment and verification of an Ion AmpliSeq TP53 Panel
Development and verification of an Ion AmpliSeq TP53 Panel
 
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
 
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef SystemFully automated Ion AmpliSeq™ library preparation using the Ion Chef System
Fully automated Ion AmpliSeq™ library preparation using the Ion Chef System
 
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
Application Note: A Simple One-Step Library Prep Method To Enable AmpliSeq Pa...
 
xGen® Lockdown® Probes
xGen® Lockdown® ProbesxGen® Lockdown® Probes
xGen® Lockdown® Probes
 
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
Pharmacogenomics Research Ion AmpliSeq Assay | ESHG 2015 Poster PM15.10
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
 
Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...Target capture of DNA from FFPE samples— recommendations for generating robus...
Target capture of DNA from FFPE samples— recommendations for generating robus...
 
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
Ribonucleoprotein delivery of CRISPR-Cas9 reagents for increased gene editing...
 
Galaxies
GalaxiesGalaxies
Galaxies
 
Sun, Moon and Planets Slideshow
Sun, Moon and Planets SlideshowSun, Moon and Planets Slideshow
Sun, Moon and Planets Slideshow
 
DATA STRUCTURES
DATA STRUCTURESDATA STRUCTURES
DATA STRUCTURES
 
Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談Spine X Live2D 百萬智多星製作經驗談
Spine X Live2D 百萬智多星製作經驗談
 
Unit 1.my school
Unit 1.my schoolUnit 1.my school
Unit 1.my school
 
Special quadrilaterals proofs ans constructions
Special quadrilaterals  proofs ans constructions Special quadrilaterals  proofs ans constructions
Special quadrilaterals proofs ans constructions
 

Similar a The Galaxy bioinformatics workflow environment

Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Shaojun Xie
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with GalaxyEnis Afgan
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Amazon Web Services
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubeFAO
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
GeoChronos
GeoChronosGeoChronos
GeoChronoscurryr
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 

Similar a The Galaxy bioinformatics workflow environment (20)

Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with Galaxy
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCube
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
GeoChronos
GeoChronosGeoChronos
GeoChronos
 
Big data
Big dataBig data
Big data
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
Grid computing
Grid computingGrid computing
Grid computing
 
IMPACT/myGrid Hackathon - Taverna Roadmap
IMPACT/myGrid Hackathon - Taverna RoadmapIMPACT/myGrid Hackathon - Taverna Roadmap
IMPACT/myGrid Hackathon - Taverna Roadmap
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 

Más de Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 

Más de Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

The Galaxy bioinformatics workflow environment

  • 1. GALAXY BIOINFORMATICS WORKFLOW ENVIRONMENT Rutger Vos, 3 April 2012
  • 2. Overview Informatics in the post-genomic era
  • 3. The past (?)   Analyses glued together using scripting languages, directly on the CLI or in GUI   Sanger sequencing   Smaller data volumes   Fewer remote data resources   Hypothesis-driven
  • 4. The present   Graphical or text- based workflow tools   “Next generation” sequencing   Large data sets   Many remote data resources   “Data-driven”
  • 5. NGS – Roche 454 pyrosequencing Pyrosequencing Genome Sequencer FLX   “Emulsion PCR”   Bead with primer in each droplet   Each bead is placed in a well with luciferase   Plate is analyzed by fiber- optic chip
  • 6. NGS – Illumina/Solexa Reversible dye-terminator seq HiSeq2000 (BGI)   DNA attaches to primer on slide and is amplified   4 RT-bases are added   Camera detects labeled nucleotides   Next 1-base cycle
  • 7. NGS – IonTorrent Ion semiconductor sequencing Ion Torrent PGM   “sequencing by synthesis”   Not light-based, sensor detects H+ ions during synthesis   Longer reads
  • 8. NGS – SOLiD Sequencing Oligonucleotide ligation and SOLiD 5500 Genetic Analyzer detection   Beads with DNA fragments   Universal adapter attached to fragments   PCR product attaches to slide   Fluorescent probes ligate to the primer
  • 9. The future   “Next-next generation” sequencing (MinION?)   Smaller data sets?   Semantic web?   Back to hypotheses?
  • 10. Workflows Tools for automating bioinformatics analyses
  • 11. Examples of workflow tools Taverna taverna.org.uk Galaxy usegalaxy.org eHive ensembl.org/info/docs/eHive Mobyle mobyle.pasteur.fr Cipres phylo.org Yahoo! Pipes pipes.yahoo.com “make” gnu.org/software/make
  • 13. Provenance, histories   Where do the data come from?   How were they altered?   Galaxy tracks the history of data.   Histories can be converted to workflows
  • 16. Reproducibility   Good science requires that results be reproducible   Some analyses are run many times
  • 17. Sharing   “Standing on the shoulders of giants”   Not re-inventing the wheel   “Executable papers” (doi: 10.1101/gr. 094508.109)
  • 18. Data What data types, expressed in which formats, does Galaxy operate on? How do I get data into and out of Galaxy?
  • 19. Data types •  Sequences •  Alignments •  Intervals •  Tabular data
  • 20. File format conversion   Built-in converters between related file formats are provided   Additional converters can be added
  • 21. Galaxy sequence formats   FASTA – def line, sequence   FASTQ – FASTA + quality - SOLEXA:   ABI/SCF – binary sequence trace (see below)   SFF (454) – binary flowgram format
  • 22. Galaxy alignment formats   MAF – multiple alignment format (see below)   (S|B)AM – text or binary format for reads and ref   AXT – pairwise alignment, from LAV   LAV – BLASTZ output, pairwise alignment
  • 23. Galaxy interval/feature formats   BED   chrom, start, end, …   INTERVAL   BED with headers   GFF (GFF3)   like BED and interval, but 1-based inclusive   WIG(GLE)   dense, continuous- valued tracks
  • 24. Other data formats In addition to interval/feature tabular data as listed previously, other files with similar properties can be processed by some tools:   *.txt tab-separated values (e.g. tabular FASTA)   HTML (for additional prose)   LPED/PBED (to describe SNPs, really two files, one for coordinates, other for alleles)
  • 25. Data I/O •  Upload via FTP •  Fetch by URL •  Import from data library •  Provided by interoperable web service
  • 28. Import from data library
  • 29. Fetch data from proxy service   User submits proxy request to Galaxy   Galaxy forwards request to remote service   Service returns data   Galaxy infers data type and presents results
  • 30. “Big Data”   NGS has led to massive data sets   Data formats are simple, binary, and/or compressed   Still, people drive around with USB hard disks
  • 31. Data sharing/publishing   The Galaxy platform allows users to publish and share their data, for example as supplemental materials to a publication* * example: http://genome.cshlp.org/content/19/11/2144
  • 32. Tools Which operations can I run on Galaxy?
  • 33. Galaxy tools   Send and Get data – Upload, fetch, send, submit   Data manipulation – Join, sort, filter   Format conversion – FASTA, other format operations   Statistics – Regressions, simulations, model tests   NGS – BAM, FASTQ, SOLiD, 454 file operations   RNA analysis – cufflinks, tophat   Evolution – branch lengths, NJ, HyPhy
  • 34. Visualization What data can I view in Galaxy?
  • 36. Galaxy visualization - scatterplots
  • 40. Deployment How to deploy your analyses on Galaxy, on public servers, in the cloud or locally
  • 41. Public servers •  “Main” at UseGalaxy.org •  NBIC •  Others listed on Galaxy wiki
  • 42. Local server Pro: Con:   Data close by   Complicated install   Can add your own tools   Many dependencies   Can develop Galaxy   UNIX-based further   UNIX-based
  • 43. Cloud Galaxy   Galaxy can be installed in the (Amazon EC2 cloud)   Private data without the hardware hassle   Uploading and storing data can be costly, however
  • 44. Implementation How does it work under the hood?
  • 45. Galaxy under the hood Issues command 1. Parses HTTP request 2. Identifies which tool to use 3. Reads tool description 4. Queues tool 5. Parses result 6. Returns HTML representation of result
  • 46. Web server   Simple Galaxy installs use a built-in, python-based HTTP server   More robust installs typically use the Apache httpd server
  • 47. Code base   Most of the framework and the wrapper code is written in python   Some wrappers in other languages, e.g. perl
  • 48. Interface language   Under the hood, Galaxy executes command-line programs and scripts   Their interfaces and tool tips are described in XML files
  • 49. Queuing   Jobs are executed asynchronously   Progress is shown in the data browser   On big servers (e.g. “Main”), queuing is managed by the dedicated “Torque” system
  • 50. UNIX   Galaxy (simply)executes command-line programs within a UNIX-like environment   Galaxy doesn’t have to “know” how to run those programs, it finds out from their descriptions at runtime
  • 51. Database   Analysis metadata is stored in a database, by default this is SQLite   More robust installs use PostgreSQL
  • 52. Configuration   Galaxy has many moving parts that can be configured   Configuration is done using simple text-based INI files
  • 53. Version control   Revision control (or version control) provides unlimited undo and detailed tracking of changes   Galaxy uses Mercurial   Popular now are svn, git and hg
  • 54. Community How to get in touch with the world-wide community to get the most out of Galaxy?
  • 55. Wiki
  • 56. Mailing lists   @lists.bx.psu.edu:   galaxy-user   galaxy-dev   galaxy-announce   galaxy-commits
  • 57. Tutorials   Galaxy 101:   Getting data from UCSC   Performing simple data manipulation   Understanding Galaxy's History system   Creating and editing workflows   Applying workflows to your data
  • 59. Events   Galaxy Community Conference   ISMB   ECCB   BOSC   PAG   Bio-IT World   GMOD Meetings
  • 60. “Tool shed”   Easy sharing of new tools   Based on Mercurial   Turns Galaxy into a modular ecosystem
  • 61. CiteULike group   Social citation manager   There is a Galaxy group:   citeulike.org/group/16008   Articles are tagged by:   PROJECT, ISGALAXY, SHARED, HOWTO, METHODS, REPRODUCIBILITY, WORKFLOW
  • 62. Organizations   Development:   Penn State University   Emory University   Support:   NSF   NHGRI   Power user:   NBIC
  • 63. Links   UseGalaxy.org – the Main server   GetGalaxy.org – for local installs   UseGalaxy.org/galaxy101 – intro tutorial   Galaxy.nbic.nl – Sombrero, the NBIC server   CiteULike.org/group/16008 – references   genome.cshlp.org/content/19/11/2144 – windshield paper   SlideShare.net/rvosa – these slides