Graph technology meetup slides

•

2 likes•217 views

Sean Mulvehill

Data & Analytics

Queries: SQL v. Gremlin
select p2.name,p1.post from posts p1
inner join persons p2
on p1.to_id=p2.id
where p1.from_id in(select id from persons where name='keith');
graph.traversal().V().has('name','keith').outE('posts')
.as('msg').inV().as('who')
.select('msg', 'who')
.by('comment').by('name’)

Traversing a graph
Suggest to John that he might know
Jane

OLTP
On-Line Transaction Processing
OLAP
On-Line Analytical Processing
Graph Databases

Image Reference: Kelvin Lawrence & Co
Graph Databases

Graph Databases Landscape
OLTP
• IBM Graph
• Titan
• Neo4j
• OrientDB
• TAO
• Jena
OLAP
• Pregel
• Spark GraphX
• Spark GraphFrames
• Giraph

Questions?
u Prachi Khadke (pskhadke@us.ibm.com) or the
people wearing the Graph T-shirts!
u Sean Mulvehill (mulvehill@us.ibm.com)

The Social+Data Graph of Life Science
Barry Wark, PhD
ovation.io

Modern life science R&D is challenging
● $50B “Cottage industry” now globalized and highly collaborative
○ Distributed teams
○ Universities, clinicians, non-profit labs, CROs, Biotech, Pharma
● PB of data, millions experiments per year
● Science is complicated
R&D organizations are expected to produce efficient pipelines from
academic research to clinical development

70%
of data collected annually in life science goes “dark” — unaccessible,
undiscoverable or unuseable

$35B
of data collected annually in life science goes “dark” — unaccessible,
undiscoverable or unuseable

Wark/Maslow hierarchy of scientific data needs
Data Storage
Metadata
Versioning
Collaboration

● Secure cloud storage
(HIPAA, 21 CFR 11)
● Metadata tied to files
● File/data Provenance across
collaborators and analyses
● Integrated annotation, chat
● Low threshold: continue to
use preferred capture,
analysis tools
A Scientific data layer stops data from
going dark

● Real-time
● Structural information: projects,
experiments, people
● High information events
○ Researcher annotation
○ Communication
○ File selection
Social+Data graph

● 350,000 Researchers
● O(100B) files
● Average academic researcher
writes 1 paper per year with 3
other colleagues in >1 countries
● k=8
● 40,000 users to a fully
connected graph
Global Social+Data graph

Assisting R&D organizations to mobilize
idle assets
1. Find relevant internal experts
2. Recommend existing, relevant data (and the resources to utilize it)
3. Identify the best external resources and opportunities
4. Organizational analytics
a. Who are the effective collaborators?
b. Which are the most valuable data sets?

Calculate (weighted) pairwise distances for all nodes using A*

● Shuffle rows & columns of the
matrix to minimize loss (spectral,
information, etc.)
● Well-studied in bioinformatics
(not that different) and text
classification
● NP-complete
● Clusters allow us to look up in
both directions
○ User → Data
○ Data → Users
○ (Users → Users)
Bi-cluster to identify
relevant groups

Publication graph
● Incomplete
● Late
● Post-hoc

Information-based co-clustering
https://cs.gmu.edu/~carlotta/publications/IBCC_TMW_final.pdf
https://pdfs.semanticscholar.org/4a3e/b95f17a88e14227b05a590639e8cd3346a99
.pdf

What's hot

Graphes de connaissances avec Neo4j Neo4j

Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...Neo4j

Introduction to Neo4jNeo4j

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Markus Harrer

Experiments With Knowledge Graphs in Fisheries & Oceans CanadaNeo4j

The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j

Graphs for AI & ML, Jim Webber, Neo4jNeo4j

Graph Databases and Graph Data Science in Neo4jijtsrd

Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j

GraphTour - Neo4j Platform OverviewNeo4j

Neo4j GraphTalk Florence - Introduction to the Neo4j Graph PlatformNeo4j

Neo4j im Fianzsektor: DIVIZENDNeo4j

Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j

A Connections-first Approach to Supply Chain OptimizationNeo4j

Your Roadmap for An Enterprise Graph Strategy Neo4j

Neo4j Product Update and Bloom DemoNeo4j

Digital Graph tour Rome: "Connect the Dots, Lorenzo SperanzoniNeo4j

Graphs in Telecommunications - Jesus Barrasa, Neo4jNeo4j

Graph tour keynote 2019Neo4j

Introduction to Neo4jNeo4j

What's hot (20)

Graphes de connaissances avec Neo4j

Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...

Introduction to Neo4j

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...

Experiments With Knowledge Graphs in Fisheries & Oceans Canada

The Future is Big Graphs: A Community View on Graph Processing Systems

Graphs for AI & ML, Jim Webber, Neo4j

Graph Databases and Graph Data Science in Neo4j

Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices

GraphTour - Neo4j Platform Overview

Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform

Neo4j im Fianzsektor: DIVIZEND

Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j

A Connections-first Approach to Supply Chain Optimization

Your Roadmap for An Enterprise Graph Strategy

Neo4j Product Update and Bloom Demo

Digital Graph tour Rome: "Connect the Dots, Lorenzo Speranzoni

Graphs in Telecommunications - Jesus Barrasa, Neo4j

Graph tour keynote 2019

Introduction to Neo4j

Viewers also liked

Graph processing - Powergraph and GraphXAmir Payberah

Social Learning and Knowledge Sharing Technologies Lecture Slides about Socia...Multimedia Communications Lab

An example graph visualization with Processing.js graphdevroom

Distributed processing of large graphs in pythonJose Quesada (hiring)

An example graph visualization with processingMax De Marzi

Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel

How to draw a graphMaths4Biosciences

Price Elasticity of Demand-Group 2-Social Studies Majors-PSU Bayambang CampusJoyce Bacud

Mga TayutayJennefer Edrozo

Tekstong deskriptibomarlon orienza

GauravcvGaurav Makan

Visual symbolssanchezmarilyna

Contextualization and-location-ntot-ap-g10Jared Ram Juezan

URI NG PANGUNGUSAP AYON SA GAMIT- FILIPINO GRADE 4MARY JEAN DACALLOS

Viewers also liked (14)

Graph processing - Powergraph and GraphX

Social Learning and Knowledge Sharing Technologies Lecture Slides about Socia...

An example graph visualization with Processing.js

Distributed processing of large graphs in python

An example graph visualization with processing

Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case

How to draw a graph

Price Elasticity of Demand-Group 2-Social Studies Majors-PSU Bayambang Campus

Mga Tayutay

Tekstong deskriptibo

Gauravcv

Visual symbols

Contextualization and-location-ntot-ap-g10

URI NG PANGUNGUSAP AYON SA GAMIT- FILIPINO GRADE 4

Recently uploaded (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.

modul pembelajaran robotic Workshop _ by Slidesgo.pptx

How we prevented account sharing with MFA

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

detection and classification of knee osteoarthritis.pptx

Learn How Data Science Changes Our World

20240419 - Measurecamp Amsterdam - SAM.pdf

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

ASML's Taxonomy Adventure by Daniel Canter

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

Semantic Shed - Squashing and Squeezing.pptx

Vision, Mission, Goals and Objectives ppt..pptx

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

Multiple time frame trading analysis -brianshannon.pdf

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)

Student Profile Sample report on improving academic performance by uniting gr...

Graph technology meetup slides

5. Graph

6. Property Graph

7. Another Graph Example Nodes Edges

8. Why graphs?

9. Queries: SQL v. Gremlin select p2.name,p1.post from posts p1 inner join persons p2 on p1.to_id=p2.id where p1.from_id in(select id from persons where name='keith'); graph.traversal().V().has('name','keith').outE('posts') .as('msg').inV().as('who') .select('msg', 'who') .by('comment').by('name’)

10. Traversing a graph Suggest to John that he might know Jane

11. Traversing a graph

12. Traversing a graph

13. OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing Graph Databases

14. Image Reference: Kelvin Lawrence & Co Graph Databases

15. Graph Databases Landscape OLTP • IBM Graph • Titan • Neo4j • OrientDB • TAO • Jena OLAP • Pregel • Spark GraphX • Spark GraphFrames • Giraph

16. Questions? u Prachi Khadke (pskhadke@us.ibm.com) or the people wearing the Graph T-shirts! u Sean Mulvehill (mulvehill@us.ibm.com)

17. The Social+Data Graph of Life Science Barry Wark, PhD ovation.io

18. Modern life science R&D is challenging ● $50B “Cottage industry” now globalized and highly collaborative ○ Distributed teams ○ Universities, clinicians, non-profit labs, CROs, Biotech, Pharma ● PB of data, millions experiments per year ● Science is complicated R&D organizations are expected to produce efficient pipelines from academic research to clinical development

19. 70% of data collected annually in life science goes “dark” — unaccessible, undiscoverable or unuseable

20. $35B of data collected annually in life science goes “dark” — unaccessible, undiscoverable or unuseable

21. Why does data go dark? ?

22. Wark/Maslow hierarchy of scientific data needs Data Storage Metadata Versioning Collaboration

23. ● Secure cloud storage (HIPAA, 21 CFR 11) ● Metadata tied to files ● File/data Provenance across collaborators and analyses ● Integrated annotation, chat ● Low threshold: continue to use preferred capture, analysis tools A Scientific data layer stops data from going dark

24. The Social+Data Graph

25. ● Real-time ● Structural information: projects, experiments, people ● High information events ○ Researcher annotation ○ Communication ○ File selection Social+Data graph

26. ● 350,000 Researchers ● O(100B) files ● Average academic researcher writes 1 paper per year with 3 other colleagues in >1 countries ● k=8 ● 40,000 users to a fully connected graph Global Social+Data graph

27. Assisting R&D organizations to mobilize idle assets 1. Find relevant internal experts 2. Recommend existing, relevant data (and the resources to utilize it) 3. Identify the best external resources and opportunities 4. Organizational analytics a. Who are the effective collaborators? b. Which are the most valuable data sets?

28. Calculate (weighted) pairwise distances for all nodes using A*

29. ● Shuffle rows & columns of the matrix to minimize loss (spectral, information, etc.) ● Well-studied in bioinformatics (not that different) and text classification ● NP-complete ● Clusters allow us to look up in both directions ○ User → Data ○ Data → Users ○ (Users → Users) Bi-cluster to identify relevant groups

30. Data architecture + GraphX

31. Relevant, related people and data

32. Questions

33. Appendix

34.

35. Publication graph ● Incomplete ● Late ● Post-hoc

36. Information-based co-clustering https://cs.gmu.edu/~carlotta/publications/IBCC_TMW_final.pdf https://pdfs.semanticscholar.org/4a3e/b95f17a88e14227b05a590639e8cd3346a99 .pdf