SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Next-Generation Informatics
David Dooling <ddooling@wustl.edu>
                AGBT Bioinformatics
                        2009-02-05
Framing the problem




ddooling@wustl.edu
Framing the problem
                         ,--./01#234#
                         567#
                         89-.3:/#;<=>#
                         8/?@/AB/#
                         6/.1-AA/C#




        !quot;quot;quot;#   !quot;quot;$#   !quot;quot;!#   !quot;quot;%#    !quot;quot;&#   !quot;quot;'#   !quot;quot;(#   !quot;quot;)#   !quot;quot;*#   !quot;quot;+#   !quot;$quot;#




ddooling@wustl.edu
Different perspectives




ddooling@wustl.edu
LIMS




ddooling@wustl.edu
LIMS - Illumina/Solexa




ddooling@wustl.edu
LIMS - Roche/454




ddooling@wustl.edu
Analysis




ddooling@wustl.edu
Analysis - cDNA
                      Solexa cDNA reads
                               Maq/Tophat
 [Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]

                            Maq
                                     Reads            Reads
   Read            SNPs
                                    map to           map to
   depth           Indels
                                  novel SJs or     “non-genic”
                                    introns          regions

                                                        Velvet
                                                        GenScan
    Gene
                   Variant            Splice        Novel
 expression
                 discovery/          isotypes       Genes
(to exquisite
                    ASE
 sensitivity)
 ddooling@wustl.edu
Project Lead




ddooling@wustl.edu
Changing pipelines




ddooling@wustl.edu
Changing pipelines - LIMS
              Tech-Specific       Primary
 Prep                                            Submission
             Prep /Detection      Analysis
   PCR                            (Technology-
                 Solexa
                                  specific)       NCBI SRA
 Hybrid
                 454
 Selection                        Flow-space
                                                  NCBI
                                                  Medical
 cDNAs           SOLiD            Color-space
                                                  Archive
                                      .
 Bisulfite       Church
                                                   Project
                                      .
                 Polony(?)
 Jumping                                           Archives
                                      .
 Libraries                                         (e.g., DCC)
                 Helicos(?)
 Sample
 Pooling
                 3730              Phred
                                                 NCBI Trace
                  …
  WGS
                        Courtesy of Toby Bloom
ddooling@wustl.edu
Changing pipelines - Analysis
                 BLAST
                                   Phrap
                   BLAT
                                  Arachne
                  PASH
                                   PCAP
                  ssaha
                                  Phusion
               runMapping




                                             Assemblers
                 ELAND             Euler
   Aligners



                mapreads
                                  ATLAS
                 Arachne
                                  Newbler
                   MAQ
                                   Velvet
                exonerate
                                   Forge
                 SHRiMP
                 SPLIGN           SSAKE
                 Mosaik
                                  VCAKE
               SLIM Search
                                 Euler-USR
              SXOligoSearch
                                 SHARCGS
                 SOAP2
                                  CABOG
                NovoCraft
                  Bowtie
                  Tophat
ddooling@wustl.edu
Framing the solution




ddooling@wustl.edu
Past is prologue




ddooling@wustl.edu
Convert this…




ddooling@wustl.edu
… into this




ddooling@wustl.edu
Convert this…




ddooling@wustl.edu
… into this




ddooling@wustl.edu
UR
• Object-relational mapping (ORM) layer
    – Interact with persistence layer (e.g., relational
      database) through objects and methods
    – Automatic, dynamic class definitions
    – Moose1-like object definition syntax
• Object context
    – In-memory transactions (even across databases)
    – Caching/deferred loading
• Dynamic command-line interface
• Integrated documentation system

                      1 - http://www.iinteractive.com/moose/
ddooling@wustl.edu
Genome Workflow




ddooling@wustl.edu
Genome Model




ddooling@wustl.edu
Past is prologue…




ddooling@wustl.edu
… but with a wrinkle
                       • Lab personnel accept
                         the software you give
                         them
                       • Analysts are more
                         than happy to develop
                         their own
                       • We need to make it
                         easy for analysts to
                         build tools within the
                         system


ddooling@wustl.edu
Easy Perl API




ddooling@wustl.edu
Pairing


 Analyst




Programmer



ddooling@wustl.edu
Variant Detection Pipeline




ddooling@wustl.edu
cDNA Analysis




ddooling@wustl.edu
16S Pipeline




ddooling@wustl.edu
Assembly and Annotation Pipeline




ddooling@wustl.edu
Challenges
•   There is still much more work to do
•   Sequencing is demolishing Moore’s law
•   The cult of traces
•   The richness of data
•   Visualization




ddooling@wustl.edu
CIRCOS




ddooling@wustl.edu
Thanks
Web Site
   http://genome.wustl.edu/
Blog
   http://www.politigenomics.com/

LIMS Paper
   http://www.biomedcentral.com/1471-2105/8/362
UR Presentation
   http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/




ddooling@wustl.edu

Más contenido relacionado

Destacado

Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcare
khvatkov
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
butest
 
Caroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinalCaroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinal
Victoria Vesna
 

Destacado (18)

Foundations for Discovery Informatics
Foundations for Discovery InformaticsFoundations for Discovery Informatics
Foundations for Discovery Informatics
 
Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcare
 
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...
 
Literature mining and large-scale data integration
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integration
 
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the CloudXu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Why Human Brain Cannot Score Her2 Cancer Biomarker
Why Human Brain Cannot Score Her2 Cancer BiomarkerWhy Human Brain Cannot Score Her2 Cancer Biomarker
Why Human Brain Cannot Score Her2 Cancer Biomarker
 
START LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
START LAB - Introduction of the MOBILE APP Edition by Olivier VerdinSTART LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
START LAB - Introduction of the MOBILE APP Edition by Olivier Verdin
 
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
Exposome & Expotype - Exploring new challenges for Health Informatics Researc...
 
Epic2014 balancing
Epic2014 balancingEpic2014 balancing
Epic2014 balancing
 
Eigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesEigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical Matrices
 
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...
 
Health Informatics
Health InformaticsHealth Informatics
Health Informatics
 
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
N. Jimenez_Informática para la salud: la genómica computacional y la medicina...
 
Prof. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's studentsProf. Mohamed Labib Salem's students
Prof. Mohamed Labib Salem's students
 
Caroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinalCaroline romedenne mapingfinalfinal
Caroline romedenne mapingfinalfinal
 
Historys Greatest
Historys GreatestHistorys Greatest
Historys Greatest
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
 

Similar a Next-Generation Informatics

Keep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOSKeep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOS
James R. Morris
 
Lumpy agbt-pres
Lumpy agbt-presLumpy agbt-pres
Lumpy agbt-pres
arq5x
 
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 BrazilCompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne
 
March 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network NewsletterMarch 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network Newsletter
Mitch Sutherland
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
gregcaporaso
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
Jose Enrique Ruiz
 
CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11
CompatibleOne
 

Similar a Next-Generation Informatics (20)

JSUG - TU Wien Cocoon Project by Andreas Pieber
JSUG - TU Wien Cocoon Project by Andreas PieberJSUG - TU Wien Cocoon Project by Andreas Pieber
JSUG - TU Wien Cocoon Project by Andreas Pieber
 
Keep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOSKeep the Complexity. Simplify with SKOS
Keep the Complexity. Simplify with SKOS
 
Lumpy agbt-pres
Lumpy agbt-presLumpy agbt-pres
Lumpy agbt-pres
 
Inter Lab Quigg 2
Inter Lab Quigg 2Inter Lab Quigg 2
Inter Lab Quigg 2
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
CompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 BrazilCompatibleOne FISL Conference 2011 Brazil
CompatibleOne FISL Conference 2011 Brazil
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Apache iBatis (ApacheCon US 2007)
Apache iBatis (ApacheCon US 2007)Apache iBatis (ApacheCon US 2007)
Apache iBatis (ApacheCon US 2007)
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
March 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network NewsletterMarch 2009 The Geomodeling Network Newsletter
March 2009 The Geomodeling Network Newsletter
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11CompatibleOne OpenStack Summit April11
CompatibleOne OpenStack Summit April11
 
Gray 110916 ns-fwkshp
Gray 110916 ns-fwkshpGray 110916 ns-fwkshp
Gray 110916 ns-fwkshp
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Next-Generation Informatics

Notas del editor

  1. There is too much data 4 genomes to more than an order of magnitude increase Move from processing regions to single genomes to multi-genome comparisons This is a story about how we are trying to deal with this problem
  2. This creates tension
  3. Sample in -> answer out Don&#x2019;t care how the sausage was made.
  4. Never the same pipe twice (TJ Max)
  5. And expanding beyond the laboratory
  6. Different aligners, genotypers
  7. How do we even begin to tackle this problem? How do we resolve the tension between changing pipelines and production systems?
  8. Metadata Store DNA types, equipment, reagents, even process steps as rows rather than tables So maq is not maq, it is an aligner Standards like SAM help
  9. Solexa/Maq specific commands
  10. Generic medical resequencing pipeline
  11. Never write SQL
  12. XML and flow chart
  13. Click on any box to see processing details including file system location
  14. Screenshot of script vs. module
  15. photograph
  16. What I have talked about here is automation There is still much work to do in data reduction
  17. How do you compare more than three genomes? How do you track all the analysis? So that&#x2019;s one problem