Powerful Google developer tools for immediate impact! (2023-24 C)
SAP HANA For Genome Data Processing: A Deep Dive
1. SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive
Dr.
Ma'hieu-‐P.
Schapranow
Emanuel
Ziegler
PI
In-‐Memory
Technology
HANA
In-‐Memory
Pla:orm
for
Life
Sciences
Genomics
and
Proteomics
Hasso
Pla9ner
Ins;tute
SAP
AG
2. Comparison
of
Costs
Comparison
of
Costs
for
Main
Memory
and
Genome
Analysis
Costs
per
Megabyte
RAM
Costs
per
Megabase
Sequencing
10000
1000
100
Costs
in
USD
10
1
0.1
0.01
0.001
1/1/01
5/1/01
9/1/01
1/1/02
5/1/02
9/1/02
1/1/03
5/1/03
9/1/03
1/1/04
5/1/04
9/1/04
1/1/05
5/1/05
9/1/05
1/1/06
5/1/06
9/1/06
1/1/07
5/1/07
9/1/07
1/1/08
5/1/08
9/1/08
1/1/09
5/1/09
9/1/09
1/1/10
5/1/10
9/1/10
1/1/11
5/1/11
9/1/11
1/1/12
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
2
3. HANA
technology
for
alignment
Efficient
streaming
of
large
amounts
of
data
using
experience
with
high
throughput
of
big
data
Cache
efficient
index
structures
for
seed
lookups
using
knowledge
from
text
search
RaFng
of
seed
matches
based
on
search
engine
prac;ces
Hardware
accelerated
gapped
alignment
using
vectoriza;on
and
bit
parallelism
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
3
4. Alignment
on
SAP
HANA
Simulated
full
genome
Illumina
HiSeq
sequenced
exome
100
bases
per
read,
single
ended
100
bases
per
read,
single
ended
BWA-‐SW
SAP
HANA
Misaligned
Misaligned
Unaligned
Unaligned
0
0.2
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
Percentage
Percentage
Misalignment
w.
r.
t.
Smith-‐Waterman
score
Misalignment
w.
r.
t.
Smith-‐Waterman
score
of
reference
alignment
from
simula;on
of
other
alignment
algorithm
result
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
4
5. Genome
Data
Processing
Integrated
in
SAP
HANA
1,000
core
cluster
■ 25
iden;cal
nodes
■ 80
cores
■ 1
TB
main
memory
■ 2.40
GHz,
30
MB
Cache
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
5
6. Real-‐;me
Combina;on
of
Latest
Research
Results
Genome
Browser
■ Comparison
of
mul;ple
mapped
genomes
with
reference
■ Explora;on
of
individual
genome
loca;ons
combined
with
latest
relevant
annota;ons
and
literature
e.g.
NCBI,
dbSNP,
UCSC,
Sanger
InterpretaFon
of
Variants
■ Variants
are
sorted,
e.g.
accordingly
to
known
associated
diseases
■ All
variants
are
linked
to
genome
browser
■ Mul;ple
pa;ents
can
be
compared
to
iden;fy
individual
disposi;ons
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
6
7. Hardware
Advances
Support
Analysis
of
Genome
Data
Alignment
and
CombinaFon
with
Latest
Variant
Calling
Research
AnnotaFons
Bound
To
CPU
Performance
Memory
Capacity
DuraFon
Hours
Weeks
SAP
&
HPI
Minutes
Real-‐;me
Mul;-‐Core
Par;;oning
&
Compression
In-‐Memory
Technology
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
7
8. What
to
take
home?
Sequencing
machines
become
faster,
smaller,
cheaper,
and
generate
immense
data
sets
in
heterogeneous
formats
■ In-‐memory
technology
is
the
key
to
explore
and
analyze
these
big
data
sets
■ Efficient
paralleliza;on
reduces
processing
;me
■ In-‐memory
technology
enables
real-‐;me
analysis
and
interac;ve
explora;on
of
genome
data
“Let’s
idenFfy
genomic
roots
and
opFmal
treatments
before
the
paFent
wakes
up
from
anaesthesia!”
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
8
9. Thank
you
for
your
interest!
Keep
in
contact
with
us.
Dr. Matthieu-P. Schapranow
Emanuel Ziegler
schapranow@hpi.uni-potsdam.de
emanuel.ziegler@SAP.com http://j.mp/schapranow
SAP AG
Hasso Plattner Institute
Emanuel Ziegler, TREX
Enterprise Platform & Integration Concepts
Dietmar-Hopp-Allee 16 Matthieu-P. Schapranow
69190 Walldorf, Germany August-Bebel-Str. 88
14482 Potsdam, Germany
SAP
HANA
For
Genome
Data
Processing:
A
Deep
Dive,
E.
Ziegler
and
Dr.
M.-‐P.
Schapranow
9