SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
GraphConnect
the power of graphs to analyze biological data
about me
who am i ...
➡ big data architect @ datablend - continuum
provide big data and nosql consultancy
• 5 years of hands-on expertise in the pharma/biotech sector
•

Davy Suvee
@DSUVEE
big data in pharma
massive data

scalable number crunching platform

complex data

visual insights-driven platform

full genome sequencing

biological networks

graphs!!
big data in pharma (2 specific use cases)

outlier detection platform
neo4j, mongodb/cassandra and gephi

euretos - brain

neo4j, mongodb, solr and prefuse
gene expression clustering
➡ oncology data set:
★ 4.800 samples
★ 27.000 genes
➡ Question:
★ for a particular subset of samples,
which genes are co-expressed?
storing gene expressions (mongodb)
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,
  "sample_name" : "122551hp133a21.cel" ,
  "genomics_id" : 122551 ,
  "sample_id" : 343981 ,
  "donor_id" : 143981 ,
  "sample_type" : "Tissue" ,
  "sample_site" : "Ascending colon" ,
  "pathology_category" : "MALIGNANT" ,
  "pathology_morphology" : "Adenocarcinoma" ,
  "pathology_type" : "Primary malignant neoplasm of colon" ,
  "primary_site" : "Colon" ,
  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,
                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,
                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,
                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,
                     … ]
}
correlating samples (mongodb/map-reduce)
x

pearson correlation

y

43

99

21

65

25

79

42

75

57

87

59

81

0,52
co-expression graph (neo4j)
122551

correlat
ed

6

create an edge between both nodes

8
value : 0,

➡ create a node for each sample
➡ if correlation between two samples >= 0.8

122553

122552
co-expression visualisation (gephi)
euretos - brain
➡ pubmed: 23 million biomedical articles
1300 new ones added every day
• google-like search interface
•

➡ reading an article ...
•

malaria is transferred by mosquitoes
euretos - brain

authors

references
euretos - brain

ooooooh crap ...
euretos - brain
➡ nanopub (nanopub.org)
•

the smallest unit of publishable information

➡ assertion
• subject: malaria
• predicate: transferred by
• object: mosquito

➡ provenance
• how this came to be (meta-data)
euretos - brain
➡ unfortunately, malaria is encoded in various ways ...
db1

db2

db3

malaria

P22384

AQ879

malaria
euretos - brain

malaria

transferred by

mosquito
euretos - brain
➡ brain (http://www.euretos.com/brain)
exploration and analysis platform
• millions of concepts/triples/nanopubs
• pubmed, uniprot, omim, pubchem, ...
•

➡ architectural stack
•
•
•

meta-data is stored in mongodb
graph in neo4j
swing interface connecting to rest endpoints
brain
brain
brain
brain
brain
brain
brain
brain
Questions?
datablend - continuum

Follow us

E-MAIL

twitter.com/data_blend
www.datablend.be

info@datablend.be

www.datablend.be

info@datablend.be

0499/05.00.89

Más contenido relacionado

Similar a The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
BOSC 2010
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
MongoDB
 

Similar a The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013 (20)

Introduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAIIntroduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAI
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
D02-NextGenSeq-MOLGENIS
D02-NextGenSeq-MOLGENISD02-NextGenSeq-MOLGENIS
D02-NextGenSeq-MOLGENIS
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Cloud Accelerated Genomics by Allen Day of Google
Cloud Accelerated Genomics by Allen Day of GoogleCloud Accelerated Genomics by Allen Day of Google
Cloud Accelerated Genomics by Allen Day of Google
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
3D culture in phenotypic screening : advantages, process changes and new tech...
3D culture in phenotypic screening : advantages, process changes and new tech...3D culture in phenotypic screening : advantages, process changes and new tech...
3D culture in phenotypic screening : advantages, process changes and new tech...
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
Building an Internet of Genomics
Building an Internet of GenomicsBuilding an Internet of Genomics
Building an Internet of Genomics
 

Más de Neo4j

Más de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect London 2013

  • 2. the power of graphs to analyze biological data
  • 3. about me who am i ... ➡ big data architect @ datablend - continuum provide big data and nosql consultancy • 5 years of hands-on expertise in the pharma/biotech sector • Davy Suvee @DSUVEE
  • 4. big data in pharma massive data scalable number crunching platform complex data visual insights-driven platform full genome sequencing biological networks graphs!!
  • 5. big data in pharma (2 specific use cases) outlier detection platform neo4j, mongodb/cassandra and gephi euretos - brain neo4j, mongodb, solr and prefuse
  • 6. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  • 7. storing gene expressions (mongodb) { "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,   "sample_name" : "122551hp133a21.cel" ,   "genomics_id" : 122551 ,   "sample_id" : 343981 ,   "donor_id" : 143981 ,   "sample_type" : "Tissue" ,   "sample_site" : "Ascending colon" ,   "pathology_category" : "MALIGNANT" ,   "pathology_morphology" : "Adenocarcinoma" ,   "pathology_type" : "Primary malignant neoplasm of colon" ,   "primary_site" : "Colon" ,   "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                     { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                     { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                     { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                      … ] }
  • 8. correlating samples (mongodb/map-reduce) x pearson correlation y 43 99 21 65 25 79 42 75 57 87 59 81 0,52
  • 9. co-expression graph (neo4j) 122551 correlat ed 6 create an edge between both nodes 8 value : 0, ➡ create a node for each sample ➡ if correlation between two samples >= 0.8 122553 122552
  • 11. euretos - brain ➡ pubmed: 23 million biomedical articles 1300 new ones added every day • google-like search interface • ➡ reading an article ... • malaria is transferred by mosquitoes
  • 14. euretos - brain ➡ nanopub (nanopub.org) • the smallest unit of publishable information ➡ assertion • subject: malaria • predicate: transferred by • object: mosquito ➡ provenance • how this came to be (meta-data)
  • 15. euretos - brain ➡ unfortunately, malaria is encoded in various ways ... db1 db2 db3 malaria P22384 AQ879 malaria
  • 17. euretos - brain ➡ brain (http://www.euretos.com/brain) exploration and analysis platform • millions of concepts/triples/nanopubs • pubmed, uniprot, omim, pubchem, ... • ➡ architectural stack • • • meta-data is stored in mongodb graph in neo4j swing interface connecting to rest endpoints
  • 18. brain
  • 19. brain
  • 20. brain
  • 21. brain
  • 22. brain
  • 23. brain
  • 24. brain
  • 25. brain
  • 27. datablend - continuum Follow us E-MAIL twitter.com/data_blend www.datablend.be info@datablend.be www.datablend.be info@datablend.be 0499/05.00.89