This talk will illustrate the power and flexibility of Graph Databases and Neo4j specifically to help in the overall analysis of biological data sets. Davy will show how to build a visual exploration environment that helps researchers at identifying clusters within various biological data sets, including gene expression and mutation prevalence data. Additionally, he will demo BRAIN (Bio Relations and Intelligence Network), a powerful data exploration platform that combines various scientific data sources (including Pubmed, Swissprot and Drugbank). It uses Neo4J under the cover to both store and enable powerful querying capabilities that provide key insights and deductions.
3. about me
who am i ...
➡ big data architect @ datablend - continuum
provide big data and nosql consultancy
• 5 years of hands-on expertise in the pharma/biotech sector
•
Davy Suvee
@DSUVEE
4. big data in pharma
massive data
scalable number crunching platform
complex data
visual insights-driven platform
full genome sequencing
biological networks
graphs!!
5. big data in pharma (2 specific use cases)
outlier detection platform
neo4j, mongodb/cassandra and gephi
euretos - brain
neo4j, mongodb, solr and prefuse
6. gene expression clustering
➡ oncology data set:
★ 4.800 samples
★ 27.000 genes
➡ Question:
★ for a particular subset of samples,
which genes are co-expressed?
11. euretos - brain
➡ pubmed: 23 million biomedical articles
1300 new ones added every day
• google-like search interface
•
➡ reading an article ...
•
malaria is transferred by mosquitoes
14. euretos - brain
➡ nanopub (nanopub.org)
•
the smallest unit of publishable information
➡ assertion
• subject: malaria
• predicate: transferred by
• object: mosquito
➡ provenance
• how this came to be (meta-data)
15. euretos - brain
➡ unfortunately, malaria is encoded in various ways ...
db1
db2
db3
malaria
P22384
AQ879
malaria