SlideShare una empresa de Scribd logo
1 de 101
Descargar para leer sin conexión
Open data with Neo4j
From ideation to production
Our (fictional) customer
Investigation journalist
Specializes in health-related scandales
Nominated for the Pulitzer prize in 2017
A few scandals over the year
A few scandals over the year
A few scandals over the year
The Customer and US
Scoping - MVP EMERGENCE
As a journalist, I need to quickly find people to interview, related to a particular
health product
For example :
Who are the managers of pharmaceutical labs producing a faulty drug?
Who are the health professionals the most influenced by these labs?
Who are the patient’s relatives, friends, colleagues... ?
Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
Data sources ?
Public data of gifts by
pharmaceutical labs to
health professionals
ETALAB - Data source schema
Pharmaceutical Lab sub-graph
Technical Stakeholder interview
“
Why would anyone use a graph database?
we are using Oracle 12c !
DETOUR : Relational VS graph database
“
NEO4J INC. IS LIKE NOSQL, IT HAS NO FUTURE, RIGHT?
Technical Stakeholder interview
Performance issues with
document management
systems
First graph library
prototypes
2000
2002
2007
2010
2013
Neo4j 2.0
Label addition to
the graph model
Neo4j browser
reworked
2016
Neo4j 3.0
Bolt protocol
Cypher extensions
2017
Neo4j 3.3
Neo Technology -> Neo4j Inc.
Neo4j Desktop with
Enterprise Edition
Development of the
first version of
Neo4j
Neo4j 1.0 is out
Headquarters moved
to the Silicon Valley
Neo4j : Leading graph database for more than 10 years !
Neo Technology is
created
“
But then, why Neo4j and
NOT another graph database?
Technical Stakeholder interview
DETOUR : NATIVE GRAPH DATABASE
:Person:Speaker
first_name Marouane
age 30
shoe_size 42
:Conference
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
since 2015
:Person:Org
EMAILED
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Conference
:Speaker
EMAILED
shoe_size 42
since 2015
DETOUR : NATIVE GRAPH DATABASE
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Conference
:Speaker
since 2015
EMAILED
shoe_size 42
DETOUR : NATIVE GRAPH DATABASE
START
NODE
(SN)
END
NODE
(EN)
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Speaker
since 2015
EMAILED
shoe_size 42
SN
PrevRel
∅
SN
NextRel
:Conference
DETOUR : NATIVE GRAPH DATABASE
START
NODE
(SN)
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Speaker
since 2015
EMAILED
shoe_size 42
SN
PrevRel
∅
SN
NextRel
:Conference
END
NODE
(EN)
EN
PrevRel
EN
NextRel
DETOUR : NATIVE GRAPH DATABASE
START
NODE
(SN)
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Speaker
since 2015
EMAILED
shoe_size 42
SN
PrevRel
∅
SN
NextRel
:Conference
END
NODE
(EN)
EN
PrevRel
EN
NextRel
Index-free adjacency
Every co-located piece of data in the
graph is co-located on the disk
DETOUR : NATIVE GRAPH DATABASE
Technical Stakeholder interview
Pharmaceutical Lab sub-graph
Import - options
Load CSV in Cypher (~= SQL for Neo4j)
UNWIND in Cypher
ETL
APOC
Cypher shell
...
Cypher Crash course
Label
Person ConfATTENDS
TYPE
Key Value
k1 v1
k2 v2
Cypher Crash course - PATTERN MATCHING
Cypher Crash course - PATTERN MATCHING
Cypher Crash course - PATTERN MATCHING
Cypher Crash course - PATTERN MATCHING
Cypher Crash course - PATTERN MATCHING
Cypher Crash course - READ queries
[MATCH WHERE]
[OPTIONAL MATCH WHERE]
[WITH [ORDER BY] [SKIP] [LIMIT]]
RETURN [ORDER BY] [SKIP] [LIMIT]
MATCH (c:Conf) RETURN c
Cypher Crash course - READ queries
MATCH (c:Conf {name: 'Devoxx Morocco'})
RETURN c
Cypher Crash course - READ queries
MATCH (c:Conf)
WHERE c.name ENDS WITH 'Morocco'
RETURN c
Cypher Crash course - READ queries
MATCH (s:Speaker)
OPTIONAL MATCH (s)-[:TALKED_AT]->(c:Conf)
WHERE c.name STARTS WITH 'Devoxx'
RETURN s
Cypher Crash course - READ queries
MATCH (p1:Player)-[:PLAYED]->(g:Game),
(p1)-[:IN_TEAM]->(t:Team)<-[:IN_TEAM]-(p2:Player)
WITH p1, COUNT(g) AS games, COLLECT(p2) AS teammates
WHERE games > 100 AND
ANY(t IN teammates WHERE f.name = 'Hadji')
RETURN p1
Cypher Crash course - READ queries
(CREATE | MERGE)
[SET|DELETE|REMOVE|FOREACH]
[RETURN [ORDER BY] [SKIP] [LIMIT]]
Cypher Crash course - write queries
CREATE (c:Conf {name: 'Devoxx Morocco'})
Cypher Crash course - write queries
MATCH (c:Conference {name: 'GraphConnect'}),
(s:Speaker {name: 'Michael'})
MERGE (s)-[l:LOVES]->(c)
ON CREATE SET l.how_much = 'very much'
Cypher Crash course - write queries
MATCH (s:Speaker {name: 'Michael'}) REMOVE s.surname
Cypher Crash course - write queries
MATCH (s:Speaker {name: 'Michael'}) DETACH DELETE s
Cypher Crash course - write queries
MATCH (n) DETACH DELETE n
Cypher Crash course - write queries
LAB IMPORT - TDD style
<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
</dependency>
<dependency>
<groupId>org.neo4j.test</groupId>
<artifactId>neo4j-harness</artifactId>
<scope>test</scope>
</dependency>
class MyClassTest {
@get:Rule val graphDb = Neo4jRule()
@Test
fun `some interesting test`() {
val subject = MyClass(graphDb.boltURI().toString())
subject.importDataset("/dataset.csv")
graphDb.graphDatabaseService.execute("MATCH (s:Something) RETURN s").use {
assertThat(it) // ...
}
}
}
LAB IMPORT - TDD style - Test skeleton
identifiant,pays_code,pays,secteur_activite_code,secteur,denomination_sociale,adresse_1,adresse_2,adresse_3,adresse_4,code_postal,ville
QBSTAWWV,[FR],FRANCE,[PA],Prestataires associés,IP Santé domicile,16 Rue de Montbrillant,Buroparc Rive Gauche,"","",69003,LYON
MQKQLNIC,[FR],FRANCE,[DM],Dispositifs médicaux,SIGVARIS,ZI SUD D'ANDREZIEUX,RUE B. THIMONNIER,"","",42173,SAINT-JUST SAINT-RAMBERT CEDEX
OETEUQSP,[FR],FRANCE,[AUT],Autres,HEALTHCARE COMPLIANCE CONSULTING FRANCE SAS,47 BOULEVARD CHARLES V,"","","",14600,HONFLEUR
FRQXZIGY,[FR],FRANCE,[MED],Médicaments,SANOFI PASTEUR MSD SNC,162 avenue Jean Jaurès,"","","",69007,Lyon
GXIVOHBB,[FR],FRANCE,[PA],Prestataires associés,ISIS DIABETE,10-16 RUE DU COLONEL ROL TANGUY,ZAC DU BOIS MOUSSAY,"","",93240,STAINS
ZQKPAZKB,[FR],FRANCE,[PA],Prestataires associés,CREAFIRST,8 Rue de l'Est,"","","",92100,BOULOGNE BILLANCOURT
GEJLGPVD,[US],ÉTATS-UNIS,[DM],Dispositifs médicaux,Nobel Biocare USA LLC,800 Corporate Drive,"","","",07430,MAHWAH
XSQKIAGK,[FR],FRANCE,[DM],Dispositifs médicaux,Cook France SARL,2 Rue due Nouveau Bercy,"","","",94227,Charenton Le Pont Cedex
ARHHJTWT,[FR],FRANCE,[DM],Dispositifs médicaux,EYETECHCARE,2871 Avenue de l'Europe,"","","",69140,RILLIEUX-LA-PAPE
LAB IMPORT - TDD style - companies.csv
@Test
fun `imports countries of companies`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (country:Country) " +
"RETURN country {.code, .name} " +
"ORDER BY country.code ASC").use {
assertThat(it).containsExactly(
row("country", mapOf(Pair("code", "[FR]"), Pair("name", "FRANCE"))),
row("country", mapOf(Pair("code", "[US]"), Pair("name", "ÉTATS-UNIS")))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - COUNTRIES
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - COUNTRIES
@Test
fun `imports cities`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (city:City) " +
"RETURN city {.name} " +
"ORDER BY city.name ASC").use {
assertThat(it).containsExactly(
row("city", mapOf(Pair("name", "BOULOGNE BILLANCOURT"))),
row("city", mapOf(Pair("name", "CHARENTON LE PONT CEDEX"))),
row("city", mapOf(Pair("name", "HONFLEUR"))),
row("city", mapOf(Pair("name", "LYON"))),
row("city", mapOf(Pair("name", "MAHWAH"))),
row("city", mapOf(Pair("name", "RILLIEUX-LA-PAPE"))),
row("city", mapOf(Pair("name", "SAINT-JUST SAINT-RAMBERT CEDEX"))),
row("city", mapOf(Pair("name", "STAINS")))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - CITIES
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - CITIES
@Test
fun `imports city|country links`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (city:City)-[:LOCATED_IN_COUNTRY]->(country:Country) " +
"RETURN country {.code}, city {.name} " +
"ORDER BY city.name ASC").use {
assertThat(it).containsExactly(
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "B[...]")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "C[...]")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "H[...]")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "LYON")))),
mapOf(Pair("country", mapOf(Pair("code", "[US]"))), Pair("city", mapOf(Pair("name", "MAHWAH")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "R[...]")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "S[...]")))),
mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "STAINS"))))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - COUNTRIES-[]-Cities
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - COUNTRIES-[]-Cities
@Test
fun `imports addresses`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (address:Address) " +
"RETURN address {.address} ").use {
assertThat(it).containsOnlyOnce(
row("address", mapOf(Pair("address", "16 RUE DE MONTBRILLANTnBUROPARC RIVE GAUCHE"))),
row("address", mapOf(Pair("address", "ZI SUD D'ANDREZIEUXnRUE B. THIMONNIER"))),
row("address", mapOf(Pair("address", "47 BOULEVARD CHARLES V"))),
row("address", mapOf(Pair("address", "162 AVENUE JEAN JAURÈS"))),
row("address", mapOf(Pair("address", "10-16 RUE DU COLONEL ROL TANGUYnZAC DU BOIS MOUSSAY"))),
row("address", mapOf(Pair("address", "8 RUE DE L'EST"))),
row("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))),
row("address", mapOf(Pair("address", "2 RUE DUE NOUVEAU BERCY"))),
row("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE")))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - ADDRESSES
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
MERGE (address:Address {address: row.address})
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - ADDRESSES
@Test
fun `imports address|city links`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (address:Address)-[location:LOCATED_IN_CITY]->(city:City) " +
"RETURN location {.zipcode}, city {.name}, address {.address} " +
"ORDER BY location.zipcode ASC").use {
assertThat(it).containsOnlyOnce(
mapOf(
Pair("location", mapOf(Pair("zipcode", "07430"))),
Pair("city", mapOf(Pair("name", "MAHWAH"))),
Pair("address", mapOf(Pair("address", "800 CORPORATE DRIVE")))
) //, [...]
)
}
assertThat(commitCounter.getCount())
.overridingErrorMessage("Expected 1 commit")
.isEqualTo(1)
}
LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
MERGE (address:Address {address: row.address})
MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city)
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
@Test
fun `imports business segments`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (segment:BusinessSegment) " +
"RETURN segment {.code, .label} " +
"ORDER BY segment.code ASC").use {
assertThat(it).containsOnlyOnce(
row("segment", mapOf(Pair("code", "[AUT]"), Pair("label", "AUTRES"))),
row("segment", mapOf(Pair("code", "[DM]"), Pair("label", "DISPOSITIFS MÉDICAUX"))),
row("segment", mapOf(Pair("code", "[MED]"), Pair("label", "MÉDICAMENTS"))),
row("segment", mapOf(Pair("code", "[PA]"), Pair("label", "PRESTATAIRES ASSOCIÉS")))
)
}
assertThat(commitCounter.getCount())
.overridingErrorMessage("Expected 1 commit")
.isEqualTo(1)
}
LAB IMPORT - TDD style - business segment
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
MERGE (address:Address {address: row.address})
MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city)
MERGE (segment:BusinessSegment { code: row.segment_code,
label: row.segment_label})
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - business segment
@Test
fun `imports companies`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (company:Company) " +
"RETURN company {.identifier, .name} " +
"ORDER BY company.identifier ASC").use {
assertThat(it).containsOnlyOnce(
row("company", mapOf(Pair("identifier", "ARHHJTWT"), Pair("name", "EYETECHCARE"))),
row("company", mapOf(Pair("identifier", "FRQXZIGY"), Pair("name", "SANOFI PASTEUR MSD SNC"))),
row("company", mapOf(Pair("identifier", "GEJLGPVD"), Pair("name", "NOBEL BIOCARE USA LLC"))),
row("company", mapOf(Pair("identifier", "GXIVOHBB"), Pair("name", "ISIS DIABETE"))),
// [...]
row("company", mapOf(Pair("identifier", "ZQKPAZKB"), Pair("name", "CREAFIRST")))
)
}
assertThat(commitCounter.getCount())
.overridingErrorMessage("Expected 1 commit")
.isEqualTo(1)
}
LAB IMPORT - TDD style - companies
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
MERGE (address:Address {address: row.address})
MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city)
MERGE (segment:BusinessSegment { code: row.segment_code,
label: row.segment_label})
MERGE (company:Company {identifier: row.company_id, name: row.company_name})
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - companies
@Test
fun `imports address|company|business segment`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH
(segment:BusinessSegment)<-[:IN_BUSINESS_SEGMENT]-(company:Company)-[:LOCATED_AT_ADDRESS]->(address:Address) " +
"RETURN company {.identifier}, segment {.code}, address {.address} " +
"ORDER BY company.identifier ASC").use {
assertThat(it).containsOnlyOnce(
mapOf(
Pair("company", mapOf(Pair("identifier", "ARHHJTWT"))),
Pair("segment", mapOf(Pair("code", "[DM]"))),
Pair("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE")))
) // [...]
)
}
assertThat(commitCounter.getCount())
.overridingErrorMessage("Expected 1 commit")
.isEqualTo(1)
}
LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
MERGE (city:City {name: row.city_name})
MERGE (city)-[:LOCATED_IN_COUNTRY]->(country)
MERGE (address:Address {address: row.address})
MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city)
MERGE (segment:BusinessSegment { code: row.segment_code,
label: row.segment_label})
MERGE (company:Company {identifier: row.company_id, name: row.company_name})
MERGE (company)-[:IN_BUSINESS_SEGMENT]->(segment)
MERGE (company)-[:LOCATED_AT_ADDRESS]->(address)
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
@Test
fun `batches commits`() {
newReader("/companies.csv").use {
subject.import(it, commitPeriod = 2)
}
assertThat(commitCounter.getCount())
.overridingErrorMessage("Expected 5 batched commits.")
.isEqualTo(5)
}
LAB IMPORT - TDD style - batch import
class CommitCounter : TransactionEventHandler<Any?> {
private val count = AtomicInteger(0)
override fun afterRollback(p0: TransactionData?, p1: Any?) {}
override fun beforeCommit(p0: TransactionData?): Any? = return null
override fun afterCommit(p0: TransactionData?, p1: Any?) = count.incrementAndGet()
fun getCount(): Int = return count.get()
fun reset() = count.set(0)
}
LAB IMPORT - TDD style - batch import
Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
Data sources
data sources - PROBLEM ?
Lab name mismatch >_<
data sources - String matching option
™
data sources - Stack Overflow-DRIVEN DEVELOPMENT !
Sørensen–Dice coefficient
Sørensen–Dice coefficient
“bois vert”
“bo”, “oi”, “is”, “ve”, “er”, “rt”
“bois ça”
“bo”, “oi”, “is”, “ça”
2 * 3 / (6 + 4) = 60 % de similarité
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
Publishing an extension 101
● Write the extension in any JVM language (Java, Scala, Kotlin…)
● Package a JAR
● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
Publishing an extension 101
● Write the extension in any JVM language (Java, Scala, Kotlin…)
● Package a JAR
● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
class MyFunction {
@UserFunction(name = "my.function")
fun doSomethingAwesome(@Name("input1") input1: String, @Name("input2") input2: String): Double {
// do something awesome...
}
}
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
In Java (Maven)
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>procedure-compiler</artifactId>
<version>${neo4j.version}</version>
</dependency>
In Kotlin (Maven)
<plugin>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-plugin</artifactId>
<version>${kotlin.version}</version>
<configuration>
<annotationProcessorPaths>
<annotationProcessorPath>
<groupId>org.neo4j</groupId>
<artifactId>procedure-compiler</artifactId>
<version>${neo4j.version}</version>
</annotationProcessorPath>
</annotationProcessorPaths>
</configuration>
<executions>
<execution><id>compile-annotations</id>
<goals><goal>kapt</goal></goals>
</execution>
</executions>
</plugin>
https://bit.ly/safer-neo4j-extensions
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
@UserFunction(name = "strings.similarity")
fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double {
if (input1 == input2) return totalMatch
val whitespace = Regex("s+")
val words1 = normalizedWords(input1, whitespace)
val words2 = normalizedWords(input2, whitespace)
if (words1 == words2) return totalMatch
val matchCount = AtomicInteger(0)
val initialPairs1 = allPairs(words1)
val initialPairs2 = allPairs(words2)
val pairs2 = initialPairs2.toMutableList()
initialPairs1.forEach {
val pair1 = it
val matchIndex = pairs2.indexOfFirst { it == pair1 }
if (matchIndex > -1) {
matchCount.incrementAndGet()
pairs2.removeAt(matchIndex)
return@forEach
}
}
return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size)
}
CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
@UserFunction(name = "strings.similarity")
fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double {
if (input1 == input2) return totalMatch
val whitespace = Regex("s+")
val words1 = normalizedWords(input1, whitespace)
val words2 = normalizedWords(input2, whitespace)
if (words1 == words2) return totalMatch
val matchCount = AtomicInteger(0)
val initialPairs1 = allPairs(words1)
val initialPairs2 = allPairs(words2)
val pairs2 = initialPairs2.toMutableList()
initialPairs1.forEach {
val pair1 = it
val matchIndex = pairs2.indexOfFirst { it == pair1 }
if (matchIndex > -1) {
matchCount.incrementAndGet()
pairs2.removeAt(matchIndex)
return@forEach
}
}
return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size)
}
83% of matches!
detour - neo4j Rule and user-defined functions
@get:Rule
val graphDb = Neo4jRule()
.withFunction(
StringSimilarityFunction::class.java
)
Drug import
session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab,
labName
ORDER BY similarity DESC
WITH drug, labName, HEAD(COLLECT(lab)) AS lab
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName
ORDER BY similarity DESC
WITH drug, labName, HEAD(COLLECT(lab)) AS lab
FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END |
MERGE (lab)<-[:DRUG_HELD_BY]-(drug))
FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END |
MERGE (fallback:Company:Ansm {name: labName})
MERGE (fallback)<-[:DRUG_HELD_BY]-(drug)
)""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
CYPHER TRICKS - FOREACH as poor man’s IF
FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END |
MERGE (lab)<-[:DRUG_HELD_BY]-(drug))
FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END |
MERGE (fallback:Company:Ansm {name: labName})
MERGE (fallback)<-[:DRUG_HELD_BY]-(drug)
)
FOREACH (item in collection | ...do something...)
@RestController
class LabsApi(private val repository: LabsRepository) {
@GetMapping("/packages/{package}/labs")
fun findLabsByMarketedDrug(@PathVariable("package") drugPackage: String): List<Lab> {
return repository.findAllByMarketedDrugPackage(drugPackage)
}
}
Drug import - API
@Repository
class LabsRepository(private val driver: Driver) {
fun findAllByMarketedDrugPackage(drugPackage: String): List<Lab> {
driver.session(AccessMode.READ).use {
val result = it.run("""
MATCH (lab:Company)<-[:DRUG_HELD_BY]-(:Drug)-[:DRUG_PACKAGED_AS]->(:Package {name: {name}})
OPTIONAL MATCH (lab)-[:IN_BUSINESS_SEGMENT]->(segment:BusinessSegment),
(lab)-[:LOCATED_AT_ADDRESS]->(address:Address),
(address)-[cityLoc:LOCATED_IN_CITY]->(city:City),
(city)-[:LOCATED_IN_COUNTRY]->(country:Country)
RETURN lab {.identifier, .name},
segment {.code, .label},
address {.toAddress},
cityLoc {.zipcode},
city {.name},
country {.code, .name}
ORDER BY lab.identifier ASC""".trimIndent(), mapOf(Pair("name", drugPackage)))
return result.list().map(this::toLab)
}
}
Drug import - REPOSITORY
Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
BENEFIT IMPORT (from previous User Story)
session.run("""
UNWIND {rows} AS row
MERGE (hp:HealthProfessional {first_name: row.first_name, last_name: row.last_name})
MERGE (ms:MedicalSpecialty {code: row.specialty_code}) ON CREATE SET ms.name = row.specialty_name
MERGE (ms)<-[:SPECIALIZES_IN]-(hp)
MERGE (y:Year {year: row.year})
MERGE (y)<-[:MONTH_IN_YEAR]-(m:Month {month: row.month})
MERGE (m)<-[:DAY_IN_MONTH]-(d:Day {day: row.day})
MERGE (bt:BenefitType {type: row.benefit_type})
CREATE (b:Benefit {amount: row.benefit_amount})
CREATE (b)-[:GIVEN_AT_DATE]->(d)
CREATE (b)-[:HAS_BENEFIT_TYPE]->(bt)
MERGE (lab:Company {identifier:row.lab_identifier})
CREATE (lab)-[:HAS_GIVEN_BENEFIT]->(b)
CREATE (hp)<-[:HAS_RECEIVED_BENEFIT]-(b)
""".trimIndent(), mapOf(Pair("rows", rows)))
TOP 3 Health Professionals - API
@RestController
class HealthProfessionalApi(private val repository: HealthProfessionalsRepository) {
@GetMapping("/benefits/{year}/health-professionals")
fun findTop3ProfessionalsWithBenefits(@PathVariable("year") year: String)
: List<Pair<HealthProfessional, AggregatedBenefits>> {
return repository.findTop3ByMostBenefitsWithinYear(year)
}
}
TOP 3 Health Professionals - API
@Repository
class HealthProfessionalsRepository(private val driver: Driver) {
fun findTop3ByMostBenefitsWithinYear(year: String): List<Pair<HealthProfessional, AggregatedBenefits>> {
val result = driver.session(AccessMode.READ).use {
val parameters = mapOf(Pair("year", year))
it.run("""
MATCH (:Year {year: {year}})<-[:MONTH_IN_YEAR]-(:Month)<-[:DAY_IN_MONTH]-(d:Day),
(bt:BenefitType)<-[:HAS_BENEFIT_TYPE]-(b:Benefit)-[:GIVEN_AT_DATE]->(d),
(lab:Company)-[:HAS_GIVEN_BENEFIT]->(b)-[:HAS_RECEIVED_BENEFIT]->(hp:HealthProfessional),
(hp)-[:SPECIALIZES_IN]->(ms:MedicalSpecialty)
WITH ms, hp, SUM(toFloat(b.amount)) AS total_amount, COLLECT(DISTINCT lab.name) AS labs,
COLLECT(bt.type) AS benefit_types
ORDER BY total_amount DESC
RETURN ms {.code, .name}, hp {.first_name, .last_name}, total_amount, labs, benefit_types
LIMIT 3
""".trimIndent(), parameters)
}
return result.list().map(this::toAggregatedHealthProfessionalBenefits)
}
}
DEPLOYMENT OPTIONS
● DIY - https://neo4j.com/docs/operations-manual/current/installation/
● Azure - https://neo4j.com/blog/neo4j-microsoft-azure-marketplace-part-1/
● Neo4j ON KUBERNETES - https://github.com/mneedham/neo4j-kubernetes
● Graphene DB
○ https://www.graphenedb.com/
○ ON HEROKU - https://elements.heroku.com/addons/graphenedb
● NEO4J Cloud FOUNDRY - WIP !
DEPLOYMENT OPTIONS
DEPLOYMENT OPTIONS
“Nothing is ever finished” - TODO list
Optimize the import
Use Spring Data Neo4j
Use “graphier” algorithms (shortest paths, page rank…)
Expose GraphQL API - http://grandstack.io/
Thank you !
Florent Biville (@fbiville) Marouane Gazanayi (@mgazanayi)
https://github.com/graph-labs/open-data-with-neo4j
Little ad for a friend (jérôme ;-))
Q&A
?
One more thing
graph-labs.fr

Más contenido relacionado

Similar a Open data with Neo4j and Kotlin

SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
Torben Brodt
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer Data
WSO2
 
Data to be collected doesn’t necessarily make sense…You only repea.docx
Data to be collected doesn’t necessarily make sense…You only repea.docxData to be collected doesn’t necessarily make sense…You only repea.docx
Data to be collected doesn’t necessarily make sense…You only repea.docx
whittemorelucilla
 

Similar a Open data with Neo4j and Kotlin (20)

Evolving a Worldwide Customer Operations Center Using Atlassian
Evolving a Worldwide Customer Operations Center Using AtlassianEvolving a Worldwide Customer Operations Center Using Atlassian
Evolving a Worldwide Customer Operations Center Using Atlassian
 
Getting Started: Developing Tropo Applications
Getting Started: Developing Tropo ApplicationsGetting Started: Developing Tropo Applications
Getting Started: Developing Tropo Applications
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
BSides LA/PDX
BSides LA/PDXBSides LA/PDX
BSides LA/PDX
 
Lessons from driving analytics projects
Lessons from driving analytics projectsLessons from driving analytics projects
Lessons from driving analytics projects
 
Sedgwick e0498336-d0105-sp7-module 01-31215a-01
Sedgwick e0498336-d0105-sp7-module 01-31215a-01Sedgwick e0498336-d0105-sp7-module 01-31215a-01
Sedgwick e0498336-d0105-sp7-module 01-31215a-01
 
Introduction to Operational Excellence - Pauwels Consulting Academy - Kris Va...
Introduction to Operational Excellence - Pauwels Consulting Academy - Kris Va...Introduction to Operational Excellence - Pauwels Consulting Academy - Kris Va...
Introduction to Operational Excellence - Pauwels Consulting Academy - Kris Va...
 
The Rough Guide to MongoDB
The Rough Guide to MongoDBThe Rough Guide to MongoDB
The Rough Guide to MongoDB
 
Drools Workshop 2015 - LATAM
Drools Workshop 2015 - LATAMDrools Workshop 2015 - LATAM
Drools Workshop 2015 - LATAM
 
BUSINESS MODEL CYBERNETICS: Simply Create, Deliver, and Share Awesome Custome...
BUSINESS MODEL CYBERNETICS: Simply Create, Deliver, and Share Awesome Custome...BUSINESS MODEL CYBERNETICS: Simply Create, Deliver, and Share Awesome Custome...
BUSINESS MODEL CYBERNETICS: Simply Create, Deliver, and Share Awesome Custome...
 
Word embeddings as a service - PyData NYC 2015
Word embeddings as a service -  PyData NYC 2015Word embeddings as a service -  PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
 
Tapcreativebrochure
TapcreativebrochureTapcreativebrochure
Tapcreativebrochure
 
Data analysis and visualization with mongo db [mongodb world 2016]
Data analysis and visualization with mongo db [mongodb world 2016]Data analysis and visualization with mongo db [mongodb world 2016]
Data analysis and visualization with mongo db [mongodb world 2016]
 
MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0
 
Iso9001 2015webinar-final
Iso9001 2015webinar-finalIso9001 2015webinar-final
Iso9001 2015webinar-final
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer Data
 
Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against You
 
2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdf2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdf
 
Data to be collected doesn’t necessarily make sense…You only repea.docx
Data to be collected doesn’t necessarily make sense…You only repea.docxData to be collected doesn’t necessarily make sense…You only repea.docx
Data to be collected doesn’t necessarily make sense…You only repea.docx
 

Más de Neo4j

Más de Neo4j (20)

Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 

Último

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Último (20)

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 

Open data with Neo4j and Kotlin

  • 1. Open data with Neo4j From ideation to production
  • 2. Our (fictional) customer Investigation journalist Specializes in health-related scandales Nominated for the Pulitzer prize in 2017
  • 3. A few scandals over the year
  • 4. A few scandals over the year
  • 5. A few scandals over the year
  • 7. Scoping - MVP EMERGENCE As a journalist, I need to quickly find people to interview, related to a particular health product For example : Who are the managers of pharmaceutical labs producing a faulty drug? Who are the health professionals the most influenced by these labs? Who are the patient’s relatives, friends, colleagues... ?
  • 8. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  • 9. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  • 10. Data sources ? Public data of gifts by pharmaceutical labs to health professionals
  • 11. ETALAB - Data source schema
  • 12.
  • 13.
  • 14.
  • 16. Technical Stakeholder interview “ Why would anyone use a graph database? we are using Oracle 12c !
  • 17. DETOUR : Relational VS graph database
  • 18. “ NEO4J INC. IS LIKE NOSQL, IT HAS NO FUTURE, RIGHT? Technical Stakeholder interview
  • 19. Performance issues with document management systems First graph library prototypes 2000 2002 2007 2010 2013 Neo4j 2.0 Label addition to the graph model Neo4j browser reworked 2016 Neo4j 3.0 Bolt protocol Cypher extensions 2017 Neo4j 3.3 Neo Technology -> Neo4j Inc. Neo4j Desktop with Enterprise Edition Development of the first version of Neo4j Neo4j 1.0 is out Headquarters moved to the Silicon Valley Neo4j : Leading graph database for more than 10 years ! Neo Technology is created
  • 20. “ But then, why Neo4j and NOT another graph database? Technical Stakeholder interview
  • 21. DETOUR : NATIVE GRAPH DATABASE :Person:Speaker first_name Marouane age 30 shoe_size 42 :Conference name Devoxx Morocco ATTENDS first_name Hanae ATTENDS since 2015 :Person:Org EMAILED
  • 22. name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Conference :Speaker EMAILED shoe_size 42 since 2015 DETOUR : NATIVE GRAPH DATABASE
  • 23. name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Conference :Speaker since 2015 EMAILED shoe_size 42 DETOUR : NATIVE GRAPH DATABASE
  • 24. START NODE (SN) END NODE (EN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference DETOUR : NATIVE GRAPH DATABASE
  • 25. START NODE (SN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference END NODE (EN) EN PrevRel EN NextRel DETOUR : NATIVE GRAPH DATABASE
  • 26. START NODE (SN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference END NODE (EN) EN PrevRel EN NextRel Index-free adjacency Every co-located piece of data in the graph is co-located on the disk DETOUR : NATIVE GRAPH DATABASE
  • 29. Import - options Load CSV in Cypher (~= SQL for Neo4j) UNWIND in Cypher ETL APOC Cypher shell ...
  • 30. Cypher Crash course Label Person ConfATTENDS TYPE Key Value k1 v1 k2 v2
  • 31. Cypher Crash course - PATTERN MATCHING
  • 32. Cypher Crash course - PATTERN MATCHING
  • 33. Cypher Crash course - PATTERN MATCHING
  • 34. Cypher Crash course - PATTERN MATCHING
  • 35. Cypher Crash course - PATTERN MATCHING
  • 36. Cypher Crash course - READ queries [MATCH WHERE] [OPTIONAL MATCH WHERE] [WITH [ORDER BY] [SKIP] [LIMIT]] RETURN [ORDER BY] [SKIP] [LIMIT]
  • 37. MATCH (c:Conf) RETURN c Cypher Crash course - READ queries
  • 38. MATCH (c:Conf {name: 'Devoxx Morocco'}) RETURN c Cypher Crash course - READ queries
  • 39. MATCH (c:Conf) WHERE c.name ENDS WITH 'Morocco' RETURN c Cypher Crash course - READ queries
  • 40. MATCH (s:Speaker) OPTIONAL MATCH (s)-[:TALKED_AT]->(c:Conf) WHERE c.name STARTS WITH 'Devoxx' RETURN s Cypher Crash course - READ queries
  • 41. MATCH (p1:Player)-[:PLAYED]->(g:Game), (p1)-[:IN_TEAM]->(t:Team)<-[:IN_TEAM]-(p2:Player) WITH p1, COUNT(g) AS games, COLLECT(p2) AS teammates WHERE games > 100 AND ANY(t IN teammates WHERE f.name = 'Hadji') RETURN p1 Cypher Crash course - READ queries
  • 42. (CREATE | MERGE) [SET|DELETE|REMOVE|FOREACH] [RETURN [ORDER BY] [SKIP] [LIMIT]] Cypher Crash course - write queries
  • 43. CREATE (c:Conf {name: 'Devoxx Morocco'}) Cypher Crash course - write queries
  • 44. MATCH (c:Conference {name: 'GraphConnect'}), (s:Speaker {name: 'Michael'}) MERGE (s)-[l:LOVES]->(c) ON CREATE SET l.how_much = 'very much' Cypher Crash course - write queries
  • 45. MATCH (s:Speaker {name: 'Michael'}) REMOVE s.surname Cypher Crash course - write queries
  • 46. MATCH (s:Speaker {name: 'Michael'}) DETACH DELETE s Cypher Crash course - write queries
  • 47. MATCH (n) DETACH DELETE n Cypher Crash course - write queries
  • 48. LAB IMPORT - TDD style <dependency> <groupId>org.neo4j.driver</groupId> <artifactId>neo4j-java-driver</artifactId> </dependency> <dependency> <groupId>org.neo4j.test</groupId> <artifactId>neo4j-harness</artifactId> <scope>test</scope> </dependency>
  • 49. class MyClassTest { @get:Rule val graphDb = Neo4jRule() @Test fun `some interesting test`() { val subject = MyClass(graphDb.boltURI().toString()) subject.importDataset("/dataset.csv") graphDb.graphDatabaseService.execute("MATCH (s:Something) RETURN s").use { assertThat(it) // ... } } } LAB IMPORT - TDD style - Test skeleton
  • 50. identifiant,pays_code,pays,secteur_activite_code,secteur,denomination_sociale,adresse_1,adresse_2,adresse_3,adresse_4,code_postal,ville QBSTAWWV,[FR],FRANCE,[PA],Prestataires associés,IP Santé domicile,16 Rue de Montbrillant,Buroparc Rive Gauche,"","",69003,LYON MQKQLNIC,[FR],FRANCE,[DM],Dispositifs médicaux,SIGVARIS,ZI SUD D'ANDREZIEUX,RUE B. THIMONNIER,"","",42173,SAINT-JUST SAINT-RAMBERT CEDEX OETEUQSP,[FR],FRANCE,[AUT],Autres,HEALTHCARE COMPLIANCE CONSULTING FRANCE SAS,47 BOULEVARD CHARLES V,"","","",14600,HONFLEUR FRQXZIGY,[FR],FRANCE,[MED],Médicaments,SANOFI PASTEUR MSD SNC,162 avenue Jean Jaurès,"","","",69007,Lyon GXIVOHBB,[FR],FRANCE,[PA],Prestataires associés,ISIS DIABETE,10-16 RUE DU COLONEL ROL TANGUY,ZAC DU BOIS MOUSSAY,"","",93240,STAINS ZQKPAZKB,[FR],FRANCE,[PA],Prestataires associés,CREAFIRST,8 Rue de l'Est,"","","",92100,BOULOGNE BILLANCOURT GEJLGPVD,[US],ÉTATS-UNIS,[DM],Dispositifs médicaux,Nobel Biocare USA LLC,800 Corporate Drive,"","","",07430,MAHWAH XSQKIAGK,[FR],FRANCE,[DM],Dispositifs médicaux,Cook France SARL,2 Rue due Nouveau Bercy,"","","",94227,Charenton Le Pont Cedex ARHHJTWT,[FR],FRANCE,[DM],Dispositifs médicaux,EYETECHCARE,2871 Avenue de l'Europe,"","","",69140,RILLIEUX-LA-PAPE LAB IMPORT - TDD style - companies.csv
  • 51. @Test fun `imports countries of companies`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (country:Country) " + "RETURN country {.code, .name} " + "ORDER BY country.code ASC").use { assertThat(it).containsExactly( row("country", mapOf(Pair("code", "[FR]"), Pair("name", "FRANCE"))), row("country", mapOf(Pair("code", "[US]"), Pair("name", "ÉTATS-UNIS"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - COUNTRIES
  • 52. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - COUNTRIES
  • 53. @Test fun `imports cities`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (city:City) " + "RETURN city {.name} " + "ORDER BY city.name ASC").use { assertThat(it).containsExactly( row("city", mapOf(Pair("name", "BOULOGNE BILLANCOURT"))), row("city", mapOf(Pair("name", "CHARENTON LE PONT CEDEX"))), row("city", mapOf(Pair("name", "HONFLEUR"))), row("city", mapOf(Pair("name", "LYON"))), row("city", mapOf(Pair("name", "MAHWAH"))), row("city", mapOf(Pair("name", "RILLIEUX-LA-PAPE"))), row("city", mapOf(Pair("name", "SAINT-JUST SAINT-RAMBERT CEDEX"))), row("city", mapOf(Pair("name", "STAINS"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - CITIES
  • 54. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - CITIES
  • 55. @Test fun `imports city|country links`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (city:City)-[:LOCATED_IN_COUNTRY]->(country:Country) " + "RETURN country {.code}, city {.name} " + "ORDER BY city.name ASC").use { assertThat(it).containsExactly( mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "B[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "C[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "H[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "LYON")))), mapOf(Pair("country", mapOf(Pair("code", "[US]"))), Pair("city", mapOf(Pair("name", "MAHWAH")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "R[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "S[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "STAINS")))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - COUNTRIES-[]-Cities
  • 56. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - COUNTRIES-[]-Cities
  • 57. @Test fun `imports addresses`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (address:Address) " + "RETURN address {.address} ").use { assertThat(it).containsOnlyOnce( row("address", mapOf(Pair("address", "16 RUE DE MONTBRILLANTnBUROPARC RIVE GAUCHE"))), row("address", mapOf(Pair("address", "ZI SUD D'ANDREZIEUXnRUE B. THIMONNIER"))), row("address", mapOf(Pair("address", "47 BOULEVARD CHARLES V"))), row("address", mapOf(Pair("address", "162 AVENUE JEAN JAURÈS"))), row("address", mapOf(Pair("address", "10-16 RUE DU COLONEL ROL TANGUYnZAC DU BOIS MOUSSAY"))), row("address", mapOf(Pair("address", "8 RUE DE L'EST"))), row("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))), row("address", mapOf(Pair("address", "2 RUE DUE NOUVEAU BERCY"))), row("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - ADDRESSES
  • 58. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - ADDRESSES
  • 59. @Test fun `imports address|city links`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (address:Address)-[location:LOCATED_IN_CITY]->(city:City) " + "RETURN location {.zipcode}, city {.name}, address {.address} " + "ORDER BY location.zipcode ASC").use { assertThat(it).containsOnlyOnce( mapOf( Pair("location", mapOf(Pair("zipcode", "07430"))), Pair("city", mapOf(Pair("name", "MAHWAH"))), Pair("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))) ) //, [...] ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
  • 60. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
  • 61. @Test fun `imports business segments`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (segment:BusinessSegment) " + "RETURN segment {.code, .label} " + "ORDER BY segment.code ASC").use { assertThat(it).containsOnlyOnce( row("segment", mapOf(Pair("code", "[AUT]"), Pair("label", "AUTRES"))), row("segment", mapOf(Pair("code", "[DM]"), Pair("label", "DISPOSITIFS MÉDICAUX"))), row("segment", mapOf(Pair("code", "[MED]"), Pair("label", "MÉDICAMENTS"))), row("segment", mapOf(Pair("code", "[PA]"), Pair("label", "PRESTATAIRES ASSOCIÉS"))) ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - business segment
  • 62. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - business segment
  • 63. @Test fun `imports companies`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (company:Company) " + "RETURN company {.identifier, .name} " + "ORDER BY company.identifier ASC").use { assertThat(it).containsOnlyOnce( row("company", mapOf(Pair("identifier", "ARHHJTWT"), Pair("name", "EYETECHCARE"))), row("company", mapOf(Pair("identifier", "FRQXZIGY"), Pair("name", "SANOFI PASTEUR MSD SNC"))), row("company", mapOf(Pair("identifier", "GEJLGPVD"), Pair("name", "NOBEL BIOCARE USA LLC"))), row("company", mapOf(Pair("identifier", "GXIVOHBB"), Pair("name", "ISIS DIABETE"))), // [...] row("company", mapOf(Pair("identifier", "ZQKPAZKB"), Pair("name", "CREAFIRST"))) ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - companies
  • 64. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) MERGE (company:Company {identifier: row.company_id, name: row.company_name}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - companies
  • 65. @Test fun `imports address|company|business segment`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (segment:BusinessSegment)<-[:IN_BUSINESS_SEGMENT]-(company:Company)-[:LOCATED_AT_ADDRESS]->(address:Address) " + "RETURN company {.identifier}, segment {.code}, address {.address} " + "ORDER BY company.identifier ASC").use { assertThat(it).containsOnlyOnce( mapOf( Pair("company", mapOf(Pair("identifier", "ARHHJTWT"))), Pair("segment", mapOf(Pair("code", "[DM]"))), Pair("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE"))) ) // [...] ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
  • 66. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) MERGE (company:Company {identifier: row.company_id, name: row.company_name}) MERGE (company)-[:IN_BUSINESS_SEGMENT]->(segment) MERGE (company)-[:LOCATED_AT_ADDRESS]->(address) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
  • 67. @Test fun `batches commits`() { newReader("/companies.csv").use { subject.import(it, commitPeriod = 2) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 5 batched commits.") .isEqualTo(5) } LAB IMPORT - TDD style - batch import
  • 68. class CommitCounter : TransactionEventHandler<Any?> { private val count = AtomicInteger(0) override fun afterRollback(p0: TransactionData?, p1: Any?) {} override fun beforeCommit(p0: TransactionData?): Any? = return null override fun afterCommit(p0: TransactionData?, p1: Any?) = count.incrementAndGet() fun getCount(): Int = return count.get() fun reset() = count.set(0) } LAB IMPORT - TDD style - batch import
  • 69. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  • 71. data sources - PROBLEM ? Lab name mismatch >_<
  • 72. data sources - String matching option ™
  • 73. data sources - Stack Overflow-DRIVEN DEVELOPMENT !
  • 75. Sørensen–Dice coefficient “bois vert” “bo”, “oi”, “is”, “ve”, “er”, “rt” “bois ça” “bo”, “oi”, “is”, “ça” 2 * 3 / (6 + 4) = 60 % de similarité
  • 76. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) Publishing an extension 101 ● Write the extension in any JVM language (Java, Scala, Kotlin…) ● Package a JAR ● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
  • 77. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) Publishing an extension 101 ● Write the extension in any JVM language (Java, Scala, Kotlin…) ● Package a JAR ● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
  • 78. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) class MyFunction { @UserFunction(name = "my.function") fun doSomethingAwesome(@Name("input1") input1: String, @Name("input2") input2: String): Double { // do something awesome... } }
  • 79. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) In Java (Maven) <dependency> <groupId>org.neo4j</groupId> <artifactId>procedure-compiler</artifactId> <version>${neo4j.version}</version> </dependency> In Kotlin (Maven) <plugin> <groupId>org.jetbrains.kotlin</groupId> <artifactId>kotlin-maven-plugin</artifactId> <version>${kotlin.version}</version> <configuration> <annotationProcessorPaths> <annotationProcessorPath> <groupId>org.neo4j</groupId> <artifactId>procedure-compiler</artifactId> <version>${neo4j.version}</version> </annotationProcessorPath> </annotationProcessorPaths> </configuration> <executions> <execution><id>compile-annotations</id> <goals><goal>kapt</goal></goals> </execution> </executions> </plugin> https://bit.ly/safer-neo4j-extensions
  • 80. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) @UserFunction(name = "strings.similarity") fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double { if (input1 == input2) return totalMatch val whitespace = Regex("s+") val words1 = normalizedWords(input1, whitespace) val words2 = normalizedWords(input2, whitespace) if (words1 == words2) return totalMatch val matchCount = AtomicInteger(0) val initialPairs1 = allPairs(words1) val initialPairs2 = allPairs(words2) val pairs2 = initialPairs2.toMutableList() initialPairs1.forEach { val pair1 = it val matchIndex = pairs2.indexOfFirst { it == pair1 } if (matchIndex > -1) { matchCount.incrementAndGet() pairs2.removeAt(matchIndex) return@forEach } } return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size) }
  • 81. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) @UserFunction(name = "strings.similarity") fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double { if (input1 == input2) return totalMatch val whitespace = Regex("s+") val words1 = normalizedWords(input1, whitespace) val words2 = normalizedWords(input2, whitespace) if (words1 == words2) return totalMatch val matchCount = AtomicInteger(0) val initialPairs1 = allPairs(words1) val initialPairs2 = allPairs(words2) val pairs2 = initialPairs2.toMutableList() initialPairs1.forEach { val pair1 = it val matchIndex = pairs2.indexOfFirst { it == pair1 } if (matchIndex > -1) { matchCount.incrementAndGet() pairs2.removeAt(matchIndex) return@forEach } } return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size) } 83% of matches!
  • 82. detour - neo4j Rule and user-defined functions @get:Rule val graphDb = Neo4jRule() .withFunction( StringSimilarityFunction::class.java )
  • 83. Drug import session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
  • 84. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  • 85. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName ORDER BY similarity DESC WITH drug, labName, HEAD(COLLECT(lab)) AS lab """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  • 86. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName ORDER BY similarity DESC WITH drug, labName, HEAD(COLLECT(lab)) AS lab FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END | MERGE (lab)<-[:DRUG_HELD_BY]-(drug)) FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END | MERGE (fallback:Company:Ansm {name: labName}) MERGE (fallback)<-[:DRUG_HELD_BY]-(drug) )""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  • 87. CYPHER TRICKS - FOREACH as poor man’s IF FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END | MERGE (lab)<-[:DRUG_HELD_BY]-(drug)) FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END | MERGE (fallback:Company:Ansm {name: labName}) MERGE (fallback)<-[:DRUG_HELD_BY]-(drug) ) FOREACH (item in collection | ...do something...)
  • 88. @RestController class LabsApi(private val repository: LabsRepository) { @GetMapping("/packages/{package}/labs") fun findLabsByMarketedDrug(@PathVariable("package") drugPackage: String): List<Lab> { return repository.findAllByMarketedDrugPackage(drugPackage) } } Drug import - API
  • 89. @Repository class LabsRepository(private val driver: Driver) { fun findAllByMarketedDrugPackage(drugPackage: String): List<Lab> { driver.session(AccessMode.READ).use { val result = it.run(""" MATCH (lab:Company)<-[:DRUG_HELD_BY]-(:Drug)-[:DRUG_PACKAGED_AS]->(:Package {name: {name}}) OPTIONAL MATCH (lab)-[:IN_BUSINESS_SEGMENT]->(segment:BusinessSegment), (lab)-[:LOCATED_AT_ADDRESS]->(address:Address), (address)-[cityLoc:LOCATED_IN_CITY]->(city:City), (city)-[:LOCATED_IN_COUNTRY]->(country:Country) RETURN lab {.identifier, .name}, segment {.code, .label}, address {.toAddress}, cityLoc {.zipcode}, city {.name}, country {.code, .name} ORDER BY lab.identifier ASC""".trimIndent(), mapOf(Pair("name", drugPackage))) return result.list().map(this::toLab) } } Drug import - REPOSITORY
  • 90. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  • 91. BENEFIT IMPORT (from previous User Story) session.run(""" UNWIND {rows} AS row MERGE (hp:HealthProfessional {first_name: row.first_name, last_name: row.last_name}) MERGE (ms:MedicalSpecialty {code: row.specialty_code}) ON CREATE SET ms.name = row.specialty_name MERGE (ms)<-[:SPECIALIZES_IN]-(hp) MERGE (y:Year {year: row.year}) MERGE (y)<-[:MONTH_IN_YEAR]-(m:Month {month: row.month}) MERGE (m)<-[:DAY_IN_MONTH]-(d:Day {day: row.day}) MERGE (bt:BenefitType {type: row.benefit_type}) CREATE (b:Benefit {amount: row.benefit_amount}) CREATE (b)-[:GIVEN_AT_DATE]->(d) CREATE (b)-[:HAS_BENEFIT_TYPE]->(bt) MERGE (lab:Company {identifier:row.lab_identifier}) CREATE (lab)-[:HAS_GIVEN_BENEFIT]->(b) CREATE (hp)<-[:HAS_RECEIVED_BENEFIT]-(b) """.trimIndent(), mapOf(Pair("rows", rows)))
  • 92. TOP 3 Health Professionals - API @RestController class HealthProfessionalApi(private val repository: HealthProfessionalsRepository) { @GetMapping("/benefits/{year}/health-professionals") fun findTop3ProfessionalsWithBenefits(@PathVariable("year") year: String) : List<Pair<HealthProfessional, AggregatedBenefits>> { return repository.findTop3ByMostBenefitsWithinYear(year) } }
  • 93. TOP 3 Health Professionals - API @Repository class HealthProfessionalsRepository(private val driver: Driver) { fun findTop3ByMostBenefitsWithinYear(year: String): List<Pair<HealthProfessional, AggregatedBenefits>> { val result = driver.session(AccessMode.READ).use { val parameters = mapOf(Pair("year", year)) it.run(""" MATCH (:Year {year: {year}})<-[:MONTH_IN_YEAR]-(:Month)<-[:DAY_IN_MONTH]-(d:Day), (bt:BenefitType)<-[:HAS_BENEFIT_TYPE]-(b:Benefit)-[:GIVEN_AT_DATE]->(d), (lab:Company)-[:HAS_GIVEN_BENEFIT]->(b)-[:HAS_RECEIVED_BENEFIT]->(hp:HealthProfessional), (hp)-[:SPECIALIZES_IN]->(ms:MedicalSpecialty) WITH ms, hp, SUM(toFloat(b.amount)) AS total_amount, COLLECT(DISTINCT lab.name) AS labs, COLLECT(bt.type) AS benefit_types ORDER BY total_amount DESC RETURN ms {.code, .name}, hp {.first_name, .last_name}, total_amount, labs, benefit_types LIMIT 3 """.trimIndent(), parameters) } return result.list().map(this::toAggregatedHealthProfessionalBenefits) } }
  • 94. DEPLOYMENT OPTIONS ● DIY - https://neo4j.com/docs/operations-manual/current/installation/ ● Azure - https://neo4j.com/blog/neo4j-microsoft-azure-marketplace-part-1/ ● Neo4j ON KUBERNETES - https://github.com/mneedham/neo4j-kubernetes ● Graphene DB ○ https://www.graphenedb.com/ ○ ON HEROKU - https://elements.heroku.com/addons/graphenedb ● NEO4J Cloud FOUNDRY - WIP !
  • 97. “Nothing is ever finished” - TODO list Optimize the import Use Spring Data Neo4j Use “graphier” algorithms (shortest paths, page rank…) Expose GraphQL API - http://grandstack.io/
  • 98. Thank you ! Florent Biville (@fbiville) Marouane Gazanayi (@mgazanayi) https://github.com/graph-labs/open-data-with-neo4j
  • 99. Little ad for a friend (jérôme ;-))
  • 100. Q&A ?