SlideShare una empresa de Scribd logo
1 de 27
In-Memory Database Technology Enables
Real-Time Genome Data Research
SAP Life Science Forum, Dublin
June 04, 2013
Dr. Matthieu Schapranow
Hasso Plattner Institute
Agenda
■  Numbers You Should Know
■  Personalized Medicine
■  High-Performance In-Memory Genome (HIG) Project
■  Outlook
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
2
Agenda
■  Numbers You Should Know
■  Personalized Medicine
■  High-Performance In-Memory Genome (HIG) Project
■  Outlook
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
3
Numbers You Should Know
Conventional Cancer Therapies
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
0% 100%
Men
Women Will
Develop
Cancer
Will Never
Develop
Cancer
American Cancer Society, Surveillance Research, 2012
Chemotherapies
Fail
Work
4
Numbers You Should Know
The Human Genome Project
■  1990: Human Genome (HG) project
started with 3B USD funding
■  2000: 1st draft of the HG announced
■  10 years until first HG version;
thousands of institutes involved
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
5
http://www.molecularecologist.com/next-gen-table-3a/
■  2013: Latest Next-Generation Sequencing (NGS) device
“Illumina HiSeq 2500” costs ≈700k USD, which enables whole
genome sequencing in <2 days for < 10k USD per run
■  But: analysis takes up to weeks
■  What’s next? Real-time analysis of genome data!
Numbers You Should Know
Comparison of Costs
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
6
0,001
0,01
0,1
1
10
100
1000
10000
01.01.01
01.05.01
01.09.01
01.01.02
01.05.02
01.09.02
01.01.03
01.05.03
01.09.03
01.01.04
01.05.04
01.09.04
01.01.05
01.05.05
01.09.05
01.01.06
01.05.06
01.09.06
01.01.07
01.05.07
01.09.07
01.01.08
01.05.08
01.09.08
01.01.09
01.05.09
01.09.09
01.01.10
01.05.10
01.09.10
01.01.11
01.05.11
01.09.11
01.01.12
01.05.12
01.09.12
01.01.13
CostsinUSD
Comparison of Costs for Main Memory and Genome Sequencing
Costs per Megabyte RAM Costs per Megabase Sequencing
Agenda
■  Numbers You Should Know
■  Personalized Medicine
■  High-Performance In-Memory Genome (HIG) Project
■  Outlook
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
7
Personalized Medicine
Our Motivation
■  Today analysis of genome data, e.g. for personalized treatment,
takes 4-6 weeks (incl. biopsy, biological preparation, sequencing,
alignment, variant calling, full analysis, and evaluation)
■  In-memory technology is suitable to accelerate genome analysis
□  Highly parallel alignment / variant calling (data preparation)
□  Real-time analysis of individual patient and cohort data
□  Combined search in structured / unstructured data
■  Challenge: Can we analyze the entire data of
a patient, incl. Electronic Medical Record (EMR) and genome
data, during a doctor’s visit?
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
8
Personalized Medicine
Our Vision
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
9
Personalized Medicine
Our Vision
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
10 Desirability
■  Integrated portfolio of specialized services
for clinicians, researchers, and patients
■  Include latest research results, e.g. most
effective therapies
Viability
■  Share data via the Internet to get
feedback from word-wide experts (cost-
saving)
■  Combine research data (publications,
annotations, genome data) from
international databases in a single
knowledge base
■  Enable personalized medicine also in far-
off regions and developing countries
Feasibility
■  Allele frequency count of 12B
records in < 1s
■  Identification of relevant
annotations out of 80M <1s
■  Integrated alignment and
variant calling within hours
instead of days
Personalized Medicine
User Requirements
For researchers
■  Enable real-time analysis of genome data
■  Automatic scan of pathways to identify cellular
impact of mutations
■  Free-text search in publications, diagnosis, and EMR
data (structured and unstructured data)
For clinicians
■  Preventive diagnostics to identify risk patients
■  Indicate pharmacokinetic correlations
■  Scan for comparable patient cases
For patients
■  Identify relevant clinical trials / experts
■  Start most appropriate therapy early based on all
evidences and latest knowledge
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
11
Agenda
■  Numbers You Should Know
■  Personalized Medicine
■  High-Performance In-Memory Genome (HIG) Project
■  Outlook
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
12
High-Performance In-Memory Genome Project
Integration of Genomic Data
■  Once DNA sequences
are generated by NGS
devices, HIG comes
into play
■  Preprocessing of DNA
(alignment, variant
calling) can be
modeled and is
executed as integrated
process
■  Results are stored in
in-memory database
to enable instant
analysis
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
13
High-Performance In-Memory Genome Project
The In-Memory Technology Toolbox
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
Any attribute
as index
Insert only
for time travel
Combined
column
and row store
+
No aggregate
tables
Minimal
projections
Partitioning
Analytics on
historical
datat
Single and
multi-tenancy
SQL interface
on columns &
rows
SQL
Reduction of
layers
x
x
Lightweight
Compression
Multi-core/
parallelization
On-the-fly
extensibility
+++
Active/passive
data storePA
Bulk load
Discovery Service
Read Event
Repositories
Verification
Services
SAP HANA
●
●
P A
up to 8.000 read
event notifications
per second
up to 2.000
requests
per second
Discovery Service
Read Event
Repositories
Verification
Services
SAP HANA
●
●
P A
up to 8.000 read
event notifications
per second
up to 2.000
requests
per second
+
+
++
T
Text Retrieval
and Extraction
Object to
relational
mapping
Dynamic
multi-
threading
within nodes
Map
reduce
No diskGroup Key
14
High-Performance In-Memory Genome Project
Challenges of Genome Data Analysis
Analysis of Genomic
Data
Alignment and
Variant Calling
Analysis of Annotations
in World-wide DBs
Bound To CPU Performance Memory Capacity
Duration Hours – Days Weeks
HPI Minutes Real-time
In-Memory
Technology
Multi-Core Partitioning & Compression
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
15
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
High-Performance In-Memory Genome Project
Challenges of Genome Data Analysis
Analysis of Genomic
Data
Alignment and
Variant Calling
Analysis of Annotations
in World-wide DBs
Bound To CPU Performance Memory Capacity
Duration Hours – Days Weeks
HPI & SAP Minutes – Hours Interactively
In-Memory
Technology
Multi-Core Partitioning & Compression
16
High-Performance In-Memory Genome Project
Selected Research Topics
Improving Analyses:
■  Clustering of patient cohorts, e.g. k-means clustering
■  Combined search, e.g. in clinical trials and side-effect databases
■  Ad-hoc analysis of genetic pathways, e.g. to identify cause/effect
Improving Data Preparations:
■  Graphical modeling of Genome Data Processing (GDP) pipelines
■  Scheduling and execution of multiple GPD pipelines in parallel
■  App store for medical knowledge (bring algorithms to data)
■  Exchange of sensitive data, e.g. history-based access control
■  Billing processes for intellectual property and services
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
17
High-Performance In-Memory Genome Project
Genomics Analysis
Loaded part of 1,000 genomes pre-phase 1 dataset
■  Chromosome 1 of 629 individuals from the 1,000 genomes project
■  12 billion entries in largest database table
■  293 GB of data (compressed in HANA)
Results
■  Report SNPs failing quality control
UCSC 102.47 sec | SAP HANA 1.25 sec – 82x faster
■  Compute the alternative allele frequency for each variant/region
VCFtools 259 sec | SAP HANA 0.43 sec – 600x faster
■  Compute the total number of missing genotypes per individual
VCFtools 548 sec | SAP HANA 2 sec – 270x faster
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
18
Supported by Dr. Carlos Bustamante lab
Chromosome	
  
Absolutefrequency
Number	
  of	
  Alleles	
  
High-Performance In-Memory Genome Project
Working With Big Data
Loaded entire 1,000 genomes pre-phase 1 dataset
■  Queries on all chromosomes for all 629 individuals
■  136 billion entries in largest database table
■  ≈1.2TB (compressed in HANA)
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
19
Query	
  results	
  using	
  R	
  connec0vity:	
  	
  
Report	
  all	
  varia0ons	
  in	
  BRCA1	
  and	
  BRCA2	
  	
  
Supported by Dr. Carlos Bustamante lab
High-Performance In-Memory Genome Project
Analysis of Patient Cohorts
20
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
■  Columnar storage optimizes
space requirements while
enabling enhancing calculation
performance
■  Single k-means clustering:
R 470ms vs. HANA 30ms (15:1)
■  >60k clusters are calculated in
<2s on 1,000 core cluster
■  è Interactive exploration of
clusters comes true
Why is a therapy only working in 80% of the patient cases?
High-Performance In-Memory Genome Project
Integration of Genetic Pathways
■  Storing and accessing graph data
within in-memory database (Active
Information Store)
■  263 pathways KEGG pathways with
6,481 genetic components, 32,784
vertices, and 90,682 edges
■  Rank all pathways by evaluation of
node connections: IMDB <350ms
■  >5,5k rankings can be calculated in
<2s on 1,000 core cluster
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
21 What are known effects for a somatic mutation?
High-Performance In-Memory Genome Project
Search in Structured / Unstructured Data
■  In-memory technology enables entity extraction, e.g. age,
genes, and drugs
■  Integrated 30k free text documents from clinicaltrials.gov
■  Relational search on entities enables interactive comparison
■  Results by rated by relevant search criteria
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
22 What clinical trials are relevant for individual patient?
High-Performance In-Memory Genome Project
Architectural Overview
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
Cohort
Analysis
Pathway
Finder
Paper
Search
In-Memory Database
Clinical Trial
Finder
Pipeline
Editor
Extensions
App Store
Access
Control
Billing
Pipeline
Data
Genome
Data
Pathways
Genome
Metadata
Papers
Pipeline
Models
Analytical
Tools
23
...
...
...
Agenda
■  Numbers You Should Know
■  Personalized Medicine
■  High-Performance In-Memory Genome (HIG) Project
■  Outlook
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
24
The Vision
Combined Data and Expert’s Knowledge
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
25
The Future
Combined Information
Enable clinicians to:
■  Make evidence-based therapy
decisions at the patient’s bed
■  Exchange latest patient data
with international experts
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
26
Enable researchers to:
■  Investigate genomes of
patient cohorts to derive new
knowledge
■  Analyze results in
real-time
Enable patients to:
■  To identify risk factors long
before they turn into diseases
■  Identify experts and similar
patient cases to bring up
alternatives for individual
therapies
Thank you for your interest!
Keep in contact with us.
In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013
Hasso Plattner Institute
Enterprise Platform & Integration Concepts
Dr. Matthieu-P. Schapranow
August-Bebel-Str. 88
14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow
schapranow@hpi.uni-potsdam.de
http://j.mp/schapranow
27

Más contenido relacionado

La actualidad más candente

KConnect - making Medical Information Easier to Find: Semantic Annotation and...
KConnect - making Medical Information Easier to Find: Semantic Annotation and...KConnect - making Medical Information Easier to Find: Semantic Annotation and...
KConnect - making Medical Information Easier to Find: Semantic Annotation and...Peter Voisey
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Katina Toufexis
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Matthieu Schapranow
 
Big data and health sciences: Machine learning in chronic illness by Huiyu Deng
Big data and health sciences: Machine learning in chronic illness  by Huiyu DengBig data and health sciences: Machine learning in chronic illness  by Huiyu Deng
Big data and health sciences: Machine learning in chronic illness by Huiyu DengData Con LA
 
Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...Bin Chen
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
Secondary Data Analysis
Secondary Data AnalysisSecondary Data Analysis
Secondary Data AnalysisREY DECASTRO
 
Forschungdaten-Repositorien - Stand und Perspektive
Forschungdaten-Repositorien - Stand und PerspektiveForschungdaten-Repositorien - Stand und Perspektive
Forschungdaten-Repositorien - Stand und PerspektiveHeinz Pampel
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...D3 Consutling
 
Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational BioinformaticsHarry Hochheiser
 

La actualidad más candente (11)

KConnect - making Medical Information Easier to Find: Semantic Annotation and...
KConnect - making Medical Information Easier to Find: Semantic Annotation and...KConnect - making Medical Information Easier to Find: Semantic Annotation and...
KConnect - making Medical Information Easier to Find: Semantic Annotation and...
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
 
Big data and health sciences: Machine learning in chronic illness by Huiyu Deng
Big data and health sciences: Machine learning in chronic illness  by Huiyu DengBig data and health sciences: Machine learning in chronic illness  by Huiyu Deng
Big data and health sciences: Machine learning in chronic illness by Huiyu Deng
 
Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Secondary Data Analysis
Secondary Data AnalysisSecondary Data Analysis
Secondary Data Analysis
 
Forschungdaten-Repositorien - Stand und Perspektive
Forschungdaten-Repositorien - Stand und PerspektiveForschungdaten-Repositorien - Stand und Perspektive
Forschungdaten-Repositorien - Stand und Perspektive
 
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
Healthcare Conference 2013 : Toekomstvisie op ICT in de gezondheidszorg - pro...
 
Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational Bioinformatics
 
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
Sara Gerke: "AI in Drug Discovery and Clinical Trials"Sara Gerke: "AI in Drug Discovery and Clinical Trials"
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
 

Similar a Enabling Real-time Genome Data Research with In-memory Database Technology (SAP Life Science Forum 2013)

A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisMatthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesMatthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
 
In-memory Applications for Informed Patients
In-memory Applications for Informed PatientsIn-memory Applications for Informed Patients
In-memory Applications for Informed PatientsMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthMatthieu Schapranow
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesMatthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesMatthieu Schapranow
 
Next-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxNext-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxSwetaTripathi13
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 

Similar a Enabling Real-time Genome Data Research with In-memory Database Technology (SAP Life Science Forum 2013) (20)

A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data Analysis
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
In-memory Applications for Informed Patients
In-memory Applications for Informed PatientsIn-memory Applications for Informed Patients
In-memory Applications for Informed Patients
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Next-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxNext-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptx
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 

Más de Matthieu Schapranow

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticeMatthieu Schapranow
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?Matthieu Schapranow
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Matthieu Schapranow
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Matthieu Schapranow
 
Festival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaFestival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaMatthieu Schapranow
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisMatthieu Schapranow
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesMatthieu Schapranow
 

Más de Matthieu Schapranow (13)

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 
Festival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaFestival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: Agenda
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response Analysis
 
Big Data in Life Sciences
Big Data in Life SciencesBig Data in Life Sciences
Big Data in Life Sciences
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life Sciences
 

Último

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 

Último (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 

Enabling Real-time Genome Data Research with In-memory Database Technology (SAP Life Science Forum 2013)

  • 1. In-Memory Database Technology Enables Real-Time Genome Data Research SAP Life Science Forum, Dublin June 04, 2013 Dr. Matthieu Schapranow Hasso Plattner Institute
  • 2. Agenda ■  Numbers You Should Know ■  Personalized Medicine ■  High-Performance In-Memory Genome (HIG) Project ■  Outlook In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 2
  • 3. Agenda ■  Numbers You Should Know ■  Personalized Medicine ■  High-Performance In-Memory Genome (HIG) Project ■  Outlook In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 3
  • 4. Numbers You Should Know Conventional Cancer Therapies In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 0% 100% Men Women Will Develop Cancer Will Never Develop Cancer American Cancer Society, Surveillance Research, 2012 Chemotherapies Fail Work 4
  • 5. Numbers You Should Know The Human Genome Project ■  1990: Human Genome (HG) project started with 3B USD funding ■  2000: 1st draft of the HG announced ■  10 years until first HG version; thousands of institutes involved In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 5 http://www.molecularecologist.com/next-gen-table-3a/ ■  2013: Latest Next-Generation Sequencing (NGS) device “Illumina HiSeq 2500” costs ≈700k USD, which enables whole genome sequencing in <2 days for < 10k USD per run ■  But: analysis takes up to weeks ■  What’s next? Real-time analysis of genome data!
  • 6. Numbers You Should Know Comparison of Costs In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 6 0,001 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 01.09.04 01.01.05 01.05.05 01.09.05 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 01.05.09 01.09.09 01.01.10 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12 01.05.12 01.09.12 01.01.13 CostsinUSD Comparison of Costs for Main Memory and Genome Sequencing Costs per Megabyte RAM Costs per Megabase Sequencing
  • 7. Agenda ■  Numbers You Should Know ■  Personalized Medicine ■  High-Performance In-Memory Genome (HIG) Project ■  Outlook In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 7
  • 8. Personalized Medicine Our Motivation ■  Today analysis of genome data, e.g. for personalized treatment, takes 4-6 weeks (incl. biopsy, biological preparation, sequencing, alignment, variant calling, full analysis, and evaluation) ■  In-memory technology is suitable to accelerate genome analysis □  Highly parallel alignment / variant calling (data preparation) □  Real-time analysis of individual patient and cohort data □  Combined search in structured / unstructured data ■  Challenge: Can we analyze the entire data of a patient, incl. Electronic Medical Record (EMR) and genome data, during a doctor’s visit? In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 8
  • 9. Personalized Medicine Our Vision In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 9
  • 10. Personalized Medicine Our Vision In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 10 Desirability ■  Integrated portfolio of specialized services for clinicians, researchers, and patients ■  Include latest research results, e.g. most effective therapies Viability ■  Share data via the Internet to get feedback from word-wide experts (cost- saving) ■  Combine research data (publications, annotations, genome data) from international databases in a single knowledge base ■  Enable personalized medicine also in far- off regions and developing countries Feasibility ■  Allele frequency count of 12B records in < 1s ■  Identification of relevant annotations out of 80M <1s ■  Integrated alignment and variant calling within hours instead of days
  • 11. Personalized Medicine User Requirements For researchers ■  Enable real-time analysis of genome data ■  Automatic scan of pathways to identify cellular impact of mutations ■  Free-text search in publications, diagnosis, and EMR data (structured and unstructured data) For clinicians ■  Preventive diagnostics to identify risk patients ■  Indicate pharmacokinetic correlations ■  Scan for comparable patient cases For patients ■  Identify relevant clinical trials / experts ■  Start most appropriate therapy early based on all evidences and latest knowledge In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 11
  • 12. Agenda ■  Numbers You Should Know ■  Personalized Medicine ■  High-Performance In-Memory Genome (HIG) Project ■  Outlook In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 12
  • 13. High-Performance In-Memory Genome Project Integration of Genomic Data ■  Once DNA sequences are generated by NGS devices, HIG comes into play ■  Preprocessing of DNA (alignment, variant calling) can be modeled and is executed as integrated process ■  Results are stored in in-memory database to enable instant analysis In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 13
  • 14. High-Performance In-Memory Genome Project The In-Memory Technology Toolbox In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 Any attribute as index Insert only for time travel Combined column and row store + No aggregate tables Minimal projections Partitioning Analytics on historical datat Single and multi-tenancy SQL interface on columns & rows SQL Reduction of layers x x Lightweight Compression Multi-core/ parallelization On-the-fly extensibility +++ Active/passive data storePA Bulk load Discovery Service Read Event Repositories Verification Services SAP HANA ● ● P A up to 8.000 read event notifications per second up to 2.000 requests per second Discovery Service Read Event Repositories Verification Services SAP HANA ● ● P A up to 8.000 read event notifications per second up to 2.000 requests per second + + ++ T Text Retrieval and Extraction Object to relational mapping Dynamic multi- threading within nodes Map reduce No diskGroup Key 14
  • 15. High-Performance In-Memory Genome Project Challenges of Genome Data Analysis Analysis of Genomic Data Alignment and Variant Calling Analysis of Annotations in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours – Days Weeks HPI Minutes Real-time In-Memory Technology Multi-Core Partitioning & Compression In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 15
  • 16. In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 High-Performance In-Memory Genome Project Challenges of Genome Data Analysis Analysis of Genomic Data Alignment and Variant Calling Analysis of Annotations in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours – Days Weeks HPI & SAP Minutes – Hours Interactively In-Memory Technology Multi-Core Partitioning & Compression 16
  • 17. High-Performance In-Memory Genome Project Selected Research Topics Improving Analyses: ■  Clustering of patient cohorts, e.g. k-means clustering ■  Combined search, e.g. in clinical trials and side-effect databases ■  Ad-hoc analysis of genetic pathways, e.g. to identify cause/effect Improving Data Preparations: ■  Graphical modeling of Genome Data Processing (GDP) pipelines ■  Scheduling and execution of multiple GPD pipelines in parallel ■  App store for medical knowledge (bring algorithms to data) ■  Exchange of sensitive data, e.g. history-based access control ■  Billing processes for intellectual property and services In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 17
  • 18. High-Performance In-Memory Genome Project Genomics Analysis Loaded part of 1,000 genomes pre-phase 1 dataset ■  Chromosome 1 of 629 individuals from the 1,000 genomes project ■  12 billion entries in largest database table ■  293 GB of data (compressed in HANA) Results ■  Report SNPs failing quality control UCSC 102.47 sec | SAP HANA 1.25 sec – 82x faster ■  Compute the alternative allele frequency for each variant/region VCFtools 259 sec | SAP HANA 0.43 sec – 600x faster ■  Compute the total number of missing genotypes per individual VCFtools 548 sec | SAP HANA 2 sec – 270x faster In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 18 Supported by Dr. Carlos Bustamante lab
  • 19. Chromosome   Absolutefrequency Number  of  Alleles   High-Performance In-Memory Genome Project Working With Big Data Loaded entire 1,000 genomes pre-phase 1 dataset ■  Queries on all chromosomes for all 629 individuals ■  136 billion entries in largest database table ■  ≈1.2TB (compressed in HANA) In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 19 Query  results  using  R  connec0vity:     Report  all  varia0ons  in  BRCA1  and  BRCA2     Supported by Dr. Carlos Bustamante lab
  • 20. High-Performance In-Memory Genome Project Analysis of Patient Cohorts 20 In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 ■  Columnar storage optimizes space requirements while enabling enhancing calculation performance ■  Single k-means clustering: R 470ms vs. HANA 30ms (15:1) ■  >60k clusters are calculated in <2s on 1,000 core cluster ■  è Interactive exploration of clusters comes true Why is a therapy only working in 80% of the patient cases?
  • 21. High-Performance In-Memory Genome Project Integration of Genetic Pathways ■  Storing and accessing graph data within in-memory database (Active Information Store) ■  263 pathways KEGG pathways with 6,481 genetic components, 32,784 vertices, and 90,682 edges ■  Rank all pathways by evaluation of node connections: IMDB <350ms ■  >5,5k rankings can be calculated in <2s on 1,000 core cluster In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 21 What are known effects for a somatic mutation?
  • 22. High-Performance In-Memory Genome Project Search in Structured / Unstructured Data ■  In-memory technology enables entity extraction, e.g. age, genes, and drugs ■  Integrated 30k free text documents from clinicaltrials.gov ■  Relational search on entities enables interactive comparison ■  Results by rated by relevant search criteria In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 22 What clinical trials are relevant for individual patient?
  • 23. High-Performance In-Memory Genome Project Architectural Overview In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 Cohort Analysis Pathway Finder Paper Search In-Memory Database Clinical Trial Finder Pipeline Editor Extensions App Store Access Control Billing Pipeline Data Genome Data Pathways Genome Metadata Papers Pipeline Models Analytical Tools 23 ... ... ...
  • 24. Agenda ■  Numbers You Should Know ■  Personalized Medicine ■  High-Performance In-Memory Genome (HIG) Project ■  Outlook In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 24
  • 25. The Vision Combined Data and Expert’s Knowledge In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 25
  • 26. The Future Combined Information Enable clinicians to: ■  Make evidence-based therapy decisions at the patient’s bed ■  Exchange latest patient data with international experts In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 26 Enable researchers to: ■  Investigate genomes of patient cohorts to derive new knowledge ■  Analyze results in real-time Enable patients to: ■  To identify risk factors long before they turn into diseases ■  Identify experts and similar patient cases to bring up alternatives for individual therapies
  • 27. Thank you for your interest! Keep in contact with us. In-Memory Technology Enables Genome Data Research, Dr. Schapranow, June 04, 2013 Hasso Plattner Institute Enterprise Platform & Integration Concepts Dr. Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Dr. Matthieu-P. Schapranow schapranow@hpi.uni-potsdam.de http://j.mp/schapranow 27