SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Next-Generation Informatics
David Dooling <ddooling@wustl.edu>
                AGBT Bioinformatics
                        2009-02-05
Framing the problem




ddooling@wustl.edu
Framing the problem
                         ,--./01#234#
                         567#
                         89-.3:/#;<=>#
                         8/?@/AB/#
                         6/.1-AA/C#




        !quot;quot;quot;#   !quot;quot;$#   !quot;quot;!#   !quot;quot;%#    !quot;quot;&#   !quot;quot;'#   !quot;quot;(#   !quot;quot;)#   !quot;quot;*#   !quot;quot;+#   !quot;$quot;#




ddooling@wustl.edu
Different perspectives




ddooling@wustl.edu
LIMS




ddooling@wustl.edu
LIMS - Illumina/Solexa




ddooling@wustl.edu
LIMS - Roche/454




ddooling@wustl.edu
Analysis




ddooling@wustl.edu
Analysis - cDNA
                      Solexa cDNA reads
                               Maq/Tophat
 [Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]

                            Maq
                                     Reads            Reads
   Read            SNPs
                                    map to           map to
   depth           Indels
                                  novel SJs or     “non-genic”
                                    introns          regions

                                                        Velvet
                                                        GenScan
    Gene
                   Variant            Splice        Novel
 expression
                 discovery/          isotypes       Genes
(to exquisite
                    ASE
 sensitivity)
 ddooling@wustl.edu
Project Lead




ddooling@wustl.edu
Changing pipelines




ddooling@wustl.edu
Changing pipelines - LIMS
              Tech-Specific       Primary
 Prep                                            Submission
             Prep /Detection      Analysis
   PCR                            (Technology-
                 Solexa
                                  specific)       NCBI SRA
 Hybrid
                 454
 Selection                        Flow-space
                                                  NCBI
                                                  Medical
 cDNAs           SOLiD            Color-space
                                                  Archive
                                      .
 Bisulfite       Church
                                                   Project
                                      .
                 Polony(?)
 Jumping                                           Archives
                                      .
 Libraries                                         (e.g., DCC)
                 Helicos(?)
 Sample
 Pooling
                 3730              Phred
                                                 NCBI Trace
                  …
  WGS
                        Courtesy of Toby Bloom
ddooling@wustl.edu
Changing pipelines - Analysis
                 BLAST
                                   Phrap
                   BLAT
                                  Arachne
                  PASH
                                   PCAP
                  ssaha
                                  Phusion
               runMapping




                                             Assemblers
                 ELAND             Euler
   Aligners



                mapreads
                                  ATLAS
                 Arachne
                                  Newbler
                   MAQ
                                   Velvet
                exonerate
                                   Forge
                 SHRiMP
                 SPLIGN           SSAKE
                 Mosaik
                                  VCAKE
               SLIM Search
                                 Euler-USR
              SXOligoSearch
                                 SHARCGS
                 SOAP2
                                  CABOG
                NovoCraft
                  Bowtie
                  Tophat
ddooling@wustl.edu
Framing the solution




ddooling@wustl.edu
Past is prologue




ddooling@wustl.edu
Convert this…




ddooling@wustl.edu
… into this




ddooling@wustl.edu
Convert this…




ddooling@wustl.edu
… into this




ddooling@wustl.edu
UR
• Object-relational mapping (ORM) layer
    – Interact with persistence layer (e.g., relational
      database) through objects and methods
    – Automatic, dynamic class definitions
    – Moose1-like object definition syntax
• Object context
    – In-memory transactions (even across databases)
    – Caching/deferred loading
• Dynamic command-line interface
• Integrated documentation system

                      1 - http://www.iinteractive.com/moose/
ddooling@wustl.edu
Genome Workflow




ddooling@wustl.edu
Genome Model




ddooling@wustl.edu
Past is prologue…




ddooling@wustl.edu
… but with a wrinkle
                       • Lab personnel accept
                         the software you give
                         them
                       • Analysts are more
                         than happy to develop
                         their own
                       • We need to make it
                         easy for analysts to
                         build tools within the
                         system


ddooling@wustl.edu
Easy Perl API




ddooling@wustl.edu
Pairing


 Analyst




Programmer



ddooling@wustl.edu
Variant Detection Pipeline




ddooling@wustl.edu
cDNA Analysis




ddooling@wustl.edu
16S Pipeline




ddooling@wustl.edu
Assembly and Annotation Pipeline




ddooling@wustl.edu
Challenges
•   There is still much more work to do
•   Sequencing is demolishing Moore’s law
•   The cult of traces
•   The richness of data
•   Visualization




ddooling@wustl.edu
CIRCOS




ddooling@wustl.edu
Thanks
Web Site
   http://genome.wustl.edu/
Blog
   http://www.politigenomics.com/

LIMS Paper
   http://www.biomedcentral.com/1471-2105/8/362
UR Presentation
   http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/




ddooling@wustl.edu

Más contenido relacionado

Destacado

Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcare
khvatkov
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
butest
 
Caroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinalCaroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinal
Victoria Vesna
 

Destacado (18)

Foundations for Discovery Informatics
Foundations for Discovery InformaticsFoundations for Discovery Informatics
Foundations for Discovery Informatics
 
Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcare
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
 
Literature mining and large-scale data integration
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integration
 
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the CloudXu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Why Human Brain Cannot Score Her2 Cancer Biomarker
Why Human Brain Cannot Score Her2 Cancer BiomarkerWhy Human Brain Cannot Score Her2 Cancer Biomarker
Why Human Brain Cannot Score Her2 Cancer Biomarker
 
START LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
START LAB - Introduction of the MOBILE APP Edition by Olivier VerdinSTART LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
START LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
 
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
 
Epic2014 balancing
Epic2014 balancingEpic2014 balancing
Epic2014 balancing
 
Eigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesEigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical Matrices
 
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
 
Health Informatics
Health InformaticsHealth Informatics
Health Informatics
 
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
 
Prof. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's studentsProf. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's students
 
Caroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinalCaroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinal
 
Historys Greatest
Historys GreatestHistorys Greatest
Historys Greatest
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
 

Similar a Next-Generation Informatics

Keep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOSKeep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOS
James R. Morris
 
Lumpy agbt-pres
Lumpy agbt-presLumpy agbt-pres
Lumpy agbt-pres
arq5x
 
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 BrazilCompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne
 
March 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network NewsletterMarch 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network Newsletter
Mitch Sutherland
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
gregcaporaso
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
Jose Enrique Ruiz
 
CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11
CompatibleOne
 

Similar a Next-Generation Informatics (20)

JSUG - TU Wien Cocoon Project by Andreas Pieber
JSUG - TU Wien Cocoon Project by Andreas PieberJSUG - TU Wien Cocoon Project by Andreas Pieber
JSUG - TU Wien Cocoon Project by Andreas Pieber
 
Keep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOSKeep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOS
 
Lumpy agbt-pres
Lumpy agbt-presLumpy agbt-pres
Lumpy agbt-pres
 
Inter Lab Quigg 2
Inter Lab Quigg 2Inter Lab Quigg 2
Inter Lab Quigg 2
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 BrazilCompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 Brazil
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Apache iBatis (ApacheCon US 2007)
Apache iBatis (ApacheCon US 2007)Apache iBatis (ApacheCon US 2007)
Apache iBatis (ApacheCon US 2007)
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
March 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network NewsletterMarch 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network Newsletter
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11
 
Gray 110916 ns-fwkshp
Gray 110916 ns-fwkshpGray 110916 ns-fwkshp
Gray 110916 ns-fwkshp
 

Último

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Último (20)

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 

Next-Generation Informatics

Notas del editor

  1. There is too much data 4 genomes to more than an order of magnitude increase Move from processing regions to single genomes to multi-genome comparisons This is a story about how we are trying to deal with this problem
  2. This creates tension
  3. Sample in -> answer out Don&#x2019;t care how the sausage was made.
  4. Never the same pipe twice (TJ Max)
  5. And expanding beyond the laboratory
  6. Different aligners, genotypers
  7. How do we even begin to tackle this problem? How do we resolve the tension between changing pipelines and production systems?
  8. Metadata Store DNA types, equipment, reagents, even process steps as rows rather than tables So maq is not maq, it is an aligner Standards like SAM help
  9. Solexa/Maq specific commands
  10. Generic medical resequencing pipeline
  11. Never write SQL
  12. XML and flow chart
  13. Click on any box to see processing details including file system location
  14. Screenshot of script vs. module
  15. photograph
  16. What I have talked about here is automation There is still much work to do in data reduction
  17. How do you compare more than three genomes? How do you track all the analysis? So that&#x2019;s one problem