SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
HIG Project Overview

           August 31, 2012




    Matthieu-P. Schapranow
    Hasso Plattner Institute
Chair of Prof. Hasso Plattner
Vision: Real-time Analysis of Genomic
    Data to Improve Medical Treatment
2




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Build up the Whole Picture out of Layers

3     ■  Data:
           □  Combine research findings from int’l scientific databases in
              single system at HPI
      ■  Platform:
           □  Expose information as a service to be consumed by special
              purpose applications
      ■  Applications:
           □  Support genome alignment pipeline processing by
           □  Massively parallel execute:
                □ Alignment algorithms, e.g. BWA, BT2, etc.
                □ Variant calling
           □  Analyze individual patient results (real-time annotations with
              combined data)
           □  Analyze patient cohorts using individual filters
    HIG Project Overview, M. Schapranow, Aug 31, 2012
How the Vision Becomes Real
4


      ■  Platform:
           □  Worker Framework: Enables parallel execution of tasks
              (alignment, variant calling) across node limits
           □  Updating Framework: Retrieves periodic database updated of
              international databases and automatically integrates them into
              local store
      ■  Applications:
           □  Alignment Coordinator: Submit alignment tasks and retrieve
              mutation lists, e.g. CSV
           □  Genome Browser: Interactive browsing in reference and
              specific patient genomes



    HIG Project Overview, M. Schapranow, Aug 31, 2012
Alignment Coordinator
5


      ■  Available Alignment Algorithms (and growing)
           □  Bowtie2
           □  Bowtie
           □  BWA
           □  TMAP
           □  SNAP
           □  MAQ
           □  SOAP




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Alignment Execution Time
6


      ■  One cell line ~600k reads / 110MB
      ■  Pipeline: Alignment and variant calling

             Property               Traditional             HPI
           Full Genome                    No                Yes
                Cores               2 * 6 cores         25 * 40 cores
           Main Memory                  48 GB              25 TB
              Runtime                   ~720                ~40s




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    History of the Human Genome Project
7


      ■  1984: Idea of a global Human Genome
         (HG) project discussed at Alta Summit:
         “DNA available on the Internet”
      ■  1990: HG project for 15 years started in
         the US (3 billion USD funding)
      ■  2000: Rough draft of the HG announced
      ■  2003: Complete genome sequenced
      ■  2006: Last and longest chr1 sequenced


      ■  … what’s next?




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Human Genome
8


              Entity                Cardinality
      Different Bases                 4 (A,C,G,T)
      Base Pairs                        3.137 Bbp
      Chromosomes                                  23
      Distinct Genes                       20k-25k
      Amino Acids                                  21
      (coded as triplets)
      Proteins                           50k-300k




      Taken from http://de.wikipedia.org/wiki/Code-Sonne

    HIG Project Overview, M. Schapranow, Aug 31, 2012
9
                                                                                Costs in USD




                                                               0,01
                                                                      0,1
                                                                            1
                                                                                    10
                                                                                               100
                                                                                                     1000
                                                                                                            10000
                                                    01.01.01
                                                    01.05.01
                                                    01.09.01
                                                    01.01.02
                                                    01.05.02
                                                    01.09.02
                                                    01.01.03
                                                    01.05.03
                                                    01.09.03
                                                    01.01.04
                                                    01.05.04
                                                                                                                                                                                                              Comparison of Costs




                                                    01.09.04
                                                    01.01.05
                                                                                                                    Costs per Megabyte RAM




                                                    01.05.05
                                                    01.09.05
                                                                                                                                                                                                              Numbers you should know




HIG Project Overview, M. Schapranow, Aug 31, 2012
                                                    01.01.06
                                                    01.05.06
                                                    01.09.06
                                                    01.01.07
                                                    01.05.07
                                                    01.09.07
                                                    01.01.08
                                                    01.05.08
                                                    01.09.08
                                                    01.01.09
                                                                                                                    Costs per Megabase Sequencing




                                                    01.05.09
                                                    01.09.09
                                                    01.01.10
                                                                                                                                                    Comparison of Costs for Main Memory and Genome Analysis




                                                    01.05.10
                                                    01.09.10
                                                    01.01.11
                                                    01.05.11
                                                    01.09.11
                                                    01.01.12
Hardware Characteristics
10


       ■  1,000 core cluster,
          25 TB main memory
       ■  Consists of 25 identical nodes:
            □  80 cores
            □  1 TB main memory
            □  Intel® Xeon® E7- 4870
            □  2.40GHz
            □  30 MB Cache




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Customer Process as of Today
11


       ■  Tissue sequencing in context of cancer treatment
       ■  Complex, time-consuming, media breaks, manual steps




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Project Objectives
12


       ■  Alignment of DNA reads (FASTQ) against reference genome
          (FASTA) è mapped reads
       ■  Real-time analysis of mapped reads
            □  Detection of mutations (SNP, INDELs)
            □  Comparison of multiple tissues
            □  Detection of similar clusters to identify co-relations
       ■  Analysis of mutations
            □  Identify mutations with scientific references (existing
               knowledge)
            □  Detection of similar clusters to identify co-relations
            □  Identify genes and regulators for certain phenotypic
               characteristics, e.g. “fast running horses”
     HIG Project Overview, M. Schapranow, Aug 31, 2012
Thank you for your interest!
     Keep in contact with us.
13




                                                                 Matthieu-P. Schapranow, M.Sc.
                                                               schapranow@hpi.uni-potsdam.de
                                                                        http://j.mp/schapranow




                                                                     Hasso Plattner Institute
                                                 Enterprise Platform & Integration Concepts
                                                                     Matthieu-P. Schapranow
                                                                       August-Bebel-Str. 88
                                                                   14482 Potsdam, Germany

     HIG Project Overview, M. Schapranow, Aug 31, 2012

Más contenido relacionado

Más de Matthieu Schapranow

Más de Matthieu Schapranow (20)

Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

High-Performance In-Memory Genome (HIG) Project

  • 1. HIG Project Overview August 31, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  • 2. Vision: Real-time Analysis of Genomic Data to Improve Medical Treatment 2 HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 3. Build up the Whole Picture out of Layers 3 ■  Data: □  Combine research findings from int’l scientific databases in single system at HPI ■  Platform: □  Expose information as a service to be consumed by special purpose applications ■  Applications: □  Support genome alignment pipeline processing by □  Massively parallel execute: □ Alignment algorithms, e.g. BWA, BT2, etc. □ Variant calling □  Analyze individual patient results (real-time annotations with combined data) □  Analyze patient cohorts using individual filters HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 4. How the Vision Becomes Real 4 ■  Platform: □  Worker Framework: Enables parallel execution of tasks (alignment, variant calling) across node limits □  Updating Framework: Retrieves periodic database updated of international databases and automatically integrates them into local store ■  Applications: □  Alignment Coordinator: Submit alignment tasks and retrieve mutation lists, e.g. CSV □  Genome Browser: Interactive browsing in reference and specific patient genomes HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 5. Alignment Coordinator 5 ■  Available Alignment Algorithms (and growing) □  Bowtie2 □  Bowtie □  BWA □  TMAP □  SNAP □  MAQ □  SOAP HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 6. Numbers you should know Alignment Execution Time 6 ■  One cell line ~600k reads / 110MB ■  Pipeline: Alignment and variant calling Property Traditional HPI Full Genome No Yes Cores 2 * 6 cores 25 * 40 cores Main Memory 48 GB 25 TB Runtime ~720 ~40s HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 7. Numbers you should know History of the Human Genome Project 7 ■  1984: Idea of a global Human Genome (HG) project discussed at Alta Summit: “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  … what’s next? HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 8. Numbers you should know Human Genome 8 Entity Cardinality Different Bases 4 (A,C,G,T) Base Pairs 3.137 Bbp Chromosomes 23 Distinct Genes 20k-25k Amino Acids 21 (coded as triplets) Proteins 50k-300k Taken from http://de.wikipedia.org/wiki/Code-Sonne HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 9. 9 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know HIG Project Overview, M. Schapranow, Aug 31, 2012 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  • 10. Hardware Characteristics 10 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 11. Customer Process as of Today 11 ■  Tissue sequencing in context of cancer treatment ■  Complex, time-consuming, media breaks, manual steps HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 12. Project Objectives 12 ■  Alignment of DNA reads (FASTQ) against reference genome (FASTA) è mapped reads ■  Real-time analysis of mapped reads □  Detection of mutations (SNP, INDELs) □  Comparison of multiple tissues □  Detection of similar clusters to identify co-relations ■  Analysis of mutations □  Identify mutations with scientific references (existing knowledge) □  Detection of similar clusters to identify co-relations □  Identify genes and regulators for certain phenotypic characteristics, e.g. “fast running horses” HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 13. Thank you for your interest! Keep in contact with us. 13 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany HIG Project Overview, M. Schapranow, Aug 31, 2012