Using Neo4j technologies for the management of systems biology models
1. Using Neo4j technologies for
the management of systems
biology models
Ron Henkel (HITS gGmbH, Heidelberg)
Dagmar Waltemath (Rostock)
Neo4j Life & Health Sciences Day - Berlin, 21st June, 2017
2. Computational Systems Biology
Biological scales DE Systems Further approaches
Images: https://doi.org/10.1002/wsbm.33, https://doi.org/10.1371/journal.pcbi.1002815, https://doi.org/10.1371/journal.pcbi.1004591
3. Data
Forest (decorticated) Path (accessible)
Matlab logo: By Jarekt (Own work) [Public domain], via Wikimedia Commons; Python logo: By www.python.org [GPL, via Wikimedia Commons]; Java logo: By Cguevara94 (Own work) [CC
BY-SA 4.0], via Wikimedia Commons, modified. Images: https://pixabay.com/de/urwald-lianen-dschungel-b%C3%A4ume-406780/, https://pixabay.com/de/buchenwald-st%C3%A4mme-
buchenst%C3%A4mme-318347/, https://pixabay.com/de/herbst-bl%C3%A4ttern-spur-laub-1432252/
Coppic
4. Challenges
Storage & retrieval
Storing simulation
studies and networks
• Large data items
• Heterogeneous
• Highly-connected
• Context-dependent
• Distributed
Provenance
Following the evolution
of models
• Error correction
• Computational power
• Evolution of biological
knowledge
• Contradicting
hypotheses
Integration
Integrating models; or
models and data
• Size of models
• Incorporation of
health data
• Security and access
rights
6. SEMS
Selected projects
1. Integrated storage of models
and simulation studies
2. Ranked retrieval
3. Identification of frequent pattern
Let’s move from relational databases
to graph databases and see if we can
improve model retrieval, simulation
analysis and model integration.
2011-2017 BMBF e:Bio
2015-2017 BMBF de.NBI
7. Integrated storage of models and
simulation studies
Figures: Rateitschak et al. (2012) https://doi.org/10.1371/journal.pcbi.1002815
8. A closer look at the data
Original figure: Martin Scharm, Martin Peters (SEMS)
15. Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
MASYMOS
Example: Tyson 1991, BIOM000000005
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
Models Simulation Annotation
16. MASYMOS
• Mapping on graph structure
• Linking
Annotation terms to ontology terms
Simulation variables to model entities
Publication to model
Model entities across model files
• Advantage
Structure can be queried across domains
Aggregation and analysis is possible
Example: Tyson 1991, BIOM000000005
17. MASYMOS Model
Publication
Annotation
Person
Simulation
Document
Tyson1991
Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623
Interpro:
IPR006670
isVersionOf
isVersion
hasPart
is
asProduct
asReactant isContainedIn
Pubmed:
1831270
Kegg Pathway
sce04111
isDescribedBy
is
EC-Code:
3.1.3.16
isVersionOf
Document
SEDML
Modelrefere
nce
Output
Datagenera
tor
Simulation Task
Variable
Variable
Document
Tyson_1991
C2 CP
time
environment
isDescribedBy Pubmed:
1831270
time timeCPC2 CP C2
is_connected is_connected
is_mapped_to
is_connected
SBO:
Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
Id
Name
Title
Journal
Abstract
Authors
…
Id
Name
Component
Variable
Species
Reaction
Compartment First name
Last name
Organization
Email
URI
Description
18. STON: SBGN to Neo4j
Implementation: Vasundra Touré, https://sourceforge.net/projects/ston. Image: Touré et al. (2016) https://doi.org/10.1186/s12859-016-1394x
19. STON: Features
Identification of submodules Model linking
Implementation: Vasundra Touré, https://sourceforge.net/projects/ston. Image: Touré et al. (2016) https://doi.org/10.1186/s12859-016-1394x
27. Reactions types found in BioModels
Implementation: Fabienne Lambusch. Figure: Lambusch et al. (in preparation). Preprint: https://peerj.com/preprints/1479
29. Summary
All code under public licenses:
MASYMOS
MORRE
STON
Pattern detection
MOST (change statistics)
M2CAT
COMBINE Archive Web
• Java based tools
• Neo4J graph database
• Parser for each format
• Reuse of existing libraries / tools
• jLibSBML
• jSedML
• Miriam Web Services (EBI)
• Apache Commons
• GSON
• Owl-api
• BiVeS-CellML
30. Future work
Future work Partners?
- Incorporating health-related data to explore the behavior of models under
varying health conditions
- More applications for MASYMOS
- Incorporating more ontologies and finding better similarity scores.
- Reducing the conglomeration of tools.
31. The team
More @ https://sems.uni-rostock.de
Left to right: Fabienne Lambusch, Martin Scharm, Dagmar Waltemath,
Mariam Nassar, Tom Gebhardt, Martin Peters, Vasundra Touré, Ron Henkel
32. Impact
SEMS is part of a large
systems biology community.
Join us. It‘s fun.
http://www.denbi.de
http://co.mbine.org