[2024]Digital Global Overview Report 2024 Meltwater.pdf
High-Performance In-Memory Genome (HIG) Project
1. HIG Project Overview
August 31, 2012
Matthieu-P. Schapranow
Hasso Plattner Institute
Chair of Prof. Hasso Plattner
2. Vision: Real-time Analysis of Genomic
Data to Improve Medical Treatment
2
HIG Project Overview, M. Schapranow, Aug 31, 2012
3. Build up the Whole Picture out of Layers
3 ■ Data:
□ Combine research findings from int’l scientific databases in
single system at HPI
■ Platform:
□ Expose information as a service to be consumed by special
purpose applications
■ Applications:
□ Support genome alignment pipeline processing by
□ Massively parallel execute:
□ Alignment algorithms, e.g. BWA, BT2, etc.
□ Variant calling
□ Analyze individual patient results (real-time annotations with
combined data)
□ Analyze patient cohorts using individual filters
HIG Project Overview, M. Schapranow, Aug 31, 2012
4. How the Vision Becomes Real
4
■ Platform:
□ Worker Framework: Enables parallel execution of tasks
(alignment, variant calling) across node limits
□ Updating Framework: Retrieves periodic database updated of
international databases and automatically integrates them into
local store
■ Applications:
□ Alignment Coordinator: Submit alignment tasks and retrieve
mutation lists, e.g. CSV
□ Genome Browser: Interactive browsing in reference and
specific patient genomes
HIG Project Overview, M. Schapranow, Aug 31, 2012
6. Numbers you should know
Alignment Execution Time
6
■ One cell line ~600k reads / 110MB
■ Pipeline: Alignment and variant calling
Property Traditional HPI
Full Genome No Yes
Cores 2 * 6 cores 25 * 40 cores
Main Memory 48 GB 25 TB
Runtime ~720 ~40s
HIG Project Overview, M. Schapranow, Aug 31, 2012
7. Numbers you should know
History of the Human Genome Project
7
■ 1984: Idea of a global Human Genome
(HG) project discussed at Alta Summit:
“DNA available on the Internet”
■ 1990: HG project for 15 years started in
the US (3 billion USD funding)
■ 2000: Rough draft of the HG announced
■ 2003: Complete genome sequenced
■ 2006: Last and longest chr1 sequenced
■ … what’s next?
HIG Project Overview, M. Schapranow, Aug 31, 2012
8. Numbers you should know
Human Genome
8
Entity Cardinality
Different Bases 4 (A,C,G,T)
Base Pairs 3.137 Bbp
Chromosomes 23
Distinct Genes 20k-25k
Amino Acids 21
(coded as triplets)
Proteins 50k-300k
Taken from http://de.wikipedia.org/wiki/Code-Sonne
HIG Project Overview, M. Schapranow, Aug 31, 2012
9. 9
Costs in USD
0,01
0,1
1
10
100
1000
10000
01.01.01
01.05.01
01.09.01
01.01.02
01.05.02
01.09.02
01.01.03
01.05.03
01.09.03
01.01.04
01.05.04
Comparison of Costs
01.09.04
01.01.05
Costs per Megabyte RAM
01.05.05
01.09.05
Numbers you should know
HIG Project Overview, M. Schapranow, Aug 31, 2012
01.01.06
01.05.06
01.09.06
01.01.07
01.05.07
01.09.07
01.01.08
01.05.08
01.09.08
01.01.09
Costs per Megabase Sequencing
01.05.09
01.09.09
01.01.10
Comparison of Costs for Main Memory and Genome Analysis
01.05.10
01.09.10
01.01.11
01.05.11
01.09.11
01.01.12
10. Hardware Characteristics
10
■ 1,000 core cluster,
25 TB main memory
■ Consists of 25 identical nodes:
□ 80 cores
□ 1 TB main memory
□ Intel® Xeon® E7- 4870
□ 2.40GHz
□ 30 MB Cache
HIG Project Overview, M. Schapranow, Aug 31, 2012
11. Customer Process as of Today
11
■ Tissue sequencing in context of cancer treatment
■ Complex, time-consuming, media breaks, manual steps
HIG Project Overview, M. Schapranow, Aug 31, 2012
12. Project Objectives
12
■ Alignment of DNA reads (FASTQ) against reference genome
(FASTA) è mapped reads
■ Real-time analysis of mapped reads
□ Detection of mutations (SNP, INDELs)
□ Comparison of multiple tissues
□ Detection of similar clusters to identify co-relations
■ Analysis of mutations
□ Identify mutations with scientific references (existing
knowledge)
□ Detection of similar clusters to identify co-relations
□ Identify genes and regulators for certain phenotypic
characteristics, e.g. “fast running horses”
HIG Project Overview, M. Schapranow, Aug 31, 2012
13. Thank you for your interest!
Keep in contact with us.
13
Matthieu-P. Schapranow, M.Sc.
schapranow@hpi.uni-potsdam.de
http://j.mp/schapranow
Hasso Plattner Institute
Enterprise Platform & Integration Concepts
Matthieu-P. Schapranow
August-Bebel-Str. 88
14482 Potsdam, Germany
HIG Project Overview, M. Schapranow, Aug 31, 2012