SlideShare una empresa de Scribd logo
1 de 108
How Graphs Help
Investigative Journalists to
Connect the Dots
Michael.Hunger@neo4j.com
YOW! Conference Australia
December 2019
(Michael Hunger)-[:WORKS_FOR]->(Neo4j)
michael@neo4j.com | @mesirii
Java Champion - Head of Developer Relations @Neo4j
What enables IJ like PanamaPapers?
1. Whistleblower + Data Leak
2. Journalistic Collaboration
3. Technology to handle the data
The Data
Whistleblowers risking a lot to expose the truth
Our world today
• Billions of exchanges
• Messages, data, transactions
• Sometimes hidden in plain sight
• Between people, companies,
organizations, governments
• Which includes criminals
panamapapers.sueddeutsche.de/en/
INSIDE THE 2.6 TBInside the 2.6 TB of Data
The data had everything
Mossack Fonseca – Mosfon – Panama – „Black Hole“
Jürgen Mossack – Ramon Fonseca
Est. 1977 – Data from 1977 to 2015
The offshore model
You Australians are pretty good at it
The Collaboration
The ICIJ - A global trust network
• Individual Reporters
• Working the Paper trail
• Much like an detective story
• Long turnaround times
• Local impact
• Large amounts of data were not easily shareable
Investigative Journalism
Investigative Journalism Today
• Benefit from whistleblowers & leaks
• Sharing a large amount of data
• Use technology and data engineers
• Collaborate globally (Trust!)
• Corrobate suspicions with
other sources locally
• Affects the world at large
• „Golden Age of IJ, never been as important“
Organization of ca. 200 journalists
Based in 65 countries
“Our aim is to bring journalists from different countries
together in teams - eliminating rivalry and promoting
collaboration. Together, we aim to be the
world’s best cross-border investigative team.”
icij.org/about
Collaboration
Supported by OSS
Tools & Encryption
+370 journalists in 80 countries
Panama Papers Timeline
• early 2015
First contact John Doe with SZ
• Spring 2015
Involving ICIJ
• Summer 2015
Start of investigations
• April 4. 2016
Public Launch
#panamapapers
Exposed the offshore holdings
of 12 current and former
world leaders.
Dealings of 128 more
politicians and public officials
around the world.
Exposure of hidden secrets
Main goals, achieved:
1) Uncover the truth
2) Assure whistle blower
safety
The Tech
Behind the ICIJ investigation
INSIDE THE 2.6 TBRemember the 2.6 TB of data?
+370 journalists
+100 media organizations
80 countries
1 Year
Data Team:
3 Data Journalists +
3 Developers !
Who was working on it?
POWER
{}Raw
Text
Raw
Files
?
Meta-Data
Database
Search Discovery
Data Processing
Nuix Investigator
• OCR
• Entity Extraction
• Analytical tools
• Philantrophic Donation
to ICIJ
Nuix is an Australian (founded 2000 Sydney) company focused on data
extraction from unstructured sources.
3 million files
x
10 seconds per
file
=
1 yr / 35 servers
= 1.5 weeks
Nuix
Investigator
Lucene syntax
queries with proximity
matching!
400
users
Disconnected Documents
Context is King name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
Context is King
MENTIONS
name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"
since:
Jan 10, 2011
name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
Journalists say: „It‘s like Magic“
Need to store and query
our connections!
Real, inferred and integrated
Neo4j
A native graph database
• Manage and store your
connected data as a graph
• Query relationships
easily and quickly
• Evolve model and applications
to support new requirements and
insights
• Built to solve relational pains
Whiteboard to Graph
NODE
key: “value”
properties
Property Graph Model
Nodes
• The entities in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
RELATIONSHIP
NODE NODE
key: “value”
properties
key: “value”
properties
key: “value”
properties$
Neo4j: All about Patterns
(:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"})
KNOWS
Dan Ann
NODE NODE
LABEL PROPERTY
neo4j.com/developer/cypher
LABEL PROPERTY
Neo4j: Create Patterns
CREATE (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"})
KNOWS
Dan Ann
NODE NODE
LABEL PROPERTY
neo4j.com/developer/cypher
LABEL PROPERTY
Cypher: Clauses
CREATE
(:Intermediary {name:“Deutsche Bank“})
-[:REPRESENTS]->(e:Entity {name:“...“})
-[:LOCATED]->(:Address {address:“...“})
-[:IN]->(:Country {name:“PAN“})
Cypher: Find Patterns
MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who
KNOWS
Dan ???
LABEL
NODE NODE
LABEL PROPERTY ALIAS ALIAS
neo4j.com/developer/cypher
Cypher: Clauses
MATCH
(o:Officer)-[owns]->(e:Entity)<--(a:Address)
WHERE a.address CONTAINS „Sydney“
RETURN o.name, owns.shares, e.name
Getting Data into Neo4j
• Bulk Load from CSV Files
• Update Graph from
• Web APIs (JSON,XML)
• Other Databases
• CSV Files
• User Activity (Logs, Callbacks)
,
,,
Import Demo – CSV dump
==> /Users/mh/Downloads/panama/import/Addresses.csv <==
address,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID
27 ROSEWOOD DRIVE #16-19 SINGAPORE 737920,6991059DFFB057DF310B9BF31CC4A0E6,The Panama Papers data is current through
2015,SGP,Singapore,14000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Entities.csv <==
name,original_name,former_name,jurisdiction,jurisdiction_description,company_type,address,internal_id,incorporation_date,inactivation_date,
struck_off_date,dorm_date,status,service_provider,ibcRUC,country_codes,countries,note,valid_until,node_id:ID,sourceID
"TIANSHENG INDUSTRY AND TRADING CO., LTD.","TIANSHENG INDUSTRY AND TRADING CO., LTD.",,SAM,Samoa,,ORION HOUSE SERVICES (HK) LIMITED ROOM 1401;
14/F.; WORLD COMMERCE CENTRE; HARBOUR CITY; 7-11 CANTON ROAD; TSIM SHA TSUI; KOWLOON; HONG KONG,1001256,23-MAR-2006,
18-FEB-2013,15-FEB-2013,,Defaulted,Mossack Fonseca,25221,HKG,Hong Kong,,The Panama Papers data is current through 2015,10000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Intermediaries.csv <==
name,internal_id,address,valid_until,country_codes,countries,status,node_id:ID,sourceID
"MICHAEL PAPAGEORGE, MR.",10001,MICHAEL PAPAGEORGE; MR. 106 NICHOLSON STREET BROOKLYN PRETORIA 0002; GAUTENG (PWV) SOUTH AFRICA,
The Panama Papers data is current through 2015,ZAF,South Africa,ACTIVE,11000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Officers.csv <==
name,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID
KIM SOO IN,E72326DEA50F1A9C2876E112AAEB42BC,The Panama Papers data is current through 2015,KOR,"Korea, Republic of",12000001,Panama Papers
==> /Users/mh/Downloads/panama/import/all_edges.csv <==
node_id:START_ID,rel_type:TYPE,node_id:END_ID
11000001,intermediary of,10208879
Import Demo - Run
$NEO4J_HOME/bin/neo4j-import --into $DATA/panama.db
--nodes:Address $DATA/Addresses_fixed.csv
--nodes:Entity $DATA/Entities.csv
--nodes:Intermediary $DATA/Intermediaries.csv
--nodes:Officer $DATA/Officers.csv
--relationships $DATA/all_edges_header.csv,$DATA/all_edges_cleaned.csv
IMPORT DONE in 20s 747ms. Imported:
839434 nodes
1253582 relationships
8211010 properties
+-----------------------------+
| labels(n) | count(*) |
+-----------------------------+
| ["Officer"] | 344455 |
| ["Entity"] | 319150 |
| ["Address"] | 151054 |
| ["Intermediary"] | 23636 |
+-----------------------------+
The Basic ICIJ Data Model
The Real ICIJ Data Model
Visualized with Linkurious UI
Data is available
Data is available
Data is available
• 785,000 Offshore Leaks Companies from several investigations
• For online browsing and visualization
• offshoreleaks.icij.org
• As CSV dump download
• As Neo4j Database download
• offshoreleaks.icij.org/pages/database
• As Neo4j sandboxes sandbox.neo4j.com
Data exposed as interactive Visualization
• Public figures and leaders
• Different shell companies & involvements
Try it yourself? sandbox.neo4j.com
Demo Time
sandbox.neo4j.com
More steps for the ICIJ and all of us
• Data integration with other sources
• Entity extraction
• Email pattern analysis
• Content & Data mining
• Machine learning
• Alerts with real time news / social media
• Investigative recommendations
• Active search for new sources ...
Current Investigation of Mossack Fonseca
Jürgen Mossack – Ramon Fonseca
Arrested, free on bail, ongoing investigations
• Flow to US tax havens
• Transparancy laws UK, EU
• $1.3bn taxes recouped
publicly
• Investigations into banks, public figures, companies
• Criminal cases solved & ongoing
World Wide Results of the Offshore leaks investigations
icij.org/investigations/panama-papers/panama-papers-helps-recover-more-than-1-2-billion-around-the-world/
Australia
Most recent SEB Bank Investigation in Sweden
• Operations of
Nordic banks
In Baltic states
• Danske Bank,
Swedbank, SEB
• Resignations,
Investigations
• Straw-men mentioned in reporting can be found
in offshore leaks db
• SEB bank in Nov 2019 !
Murder Daphne Caruana Galizia
• Maltese Investigative
Reporter
• Reported on Panama Papers
appearances of influential
Maltese Politicians
• Murdered Oct 16 2017
with car bomb
• „The Daphne Report“
• PM finally resigning
en.wikipedia.org/wiki/Daphne_Caruana_Galizia
Murder Jan Kuciak
• Slovak Investigative
Reporter
• Reported on criminal
behavior of businessmen
• Jan & fiancé shot Feb 2018
• Massive reactions +
political crisis
• PM and several ministers
resigned
en.wikipedia.org/wiki/Murder_of_J%C3%A1n_Kuciak
The ICIJ didn’t stop there
#BahamasLeak
Read & Watch More
How can you investigate
large complex data
using Graphs ?
Apply full set of available tools.
Source: John Swain - Twitter Analytics Right Relevance Talk
Russia Twitter Trolls
democrats-intelligence.house.gov/uploadedfiles/exhibit_b.pdf
● 2752 Twitter accounts tied to Russia’s
Internet Research Agency
● Accounts suspended by Twitter
○ Data deleted
● What were they tweeting about?
Internet Research Agency
345k Tweets, 41k Users (454 Russian Trolls)
Your typical American Citizen?
Your typical Local News Publication?
Your typical Local Political Party?
@LeroyLovesUSA
@TEN_GOP
@ClevelandOnline
Your typical Russian Troll
Your typical Russian Troll
Your typical Russian Troll
@LeroyLovesUSA
@TEN_GOP
@ClevelandOnline
Natural Language Processing
With Cypher and Neo4j
AnnotationsNLP w/ Graph Databases
AnnotationsNLP w/ Graph Databases
NLP
Process
http://www.lyonwj.com/2017/11/15/entity-extraction-russian-troll-tweets-neo4j/
Graph Algorithms
Gain new insights from context & topology
Graph & ML Algorithms in Neo4j+35
neo4j.com/
graph-algorithms-
book/
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
Neo4j
Native Graph
Database
Analytics
Integrations
Cypher Query
Language
Wide Range of
APOC Procedures
Optimized
Graph Algorithms
Inferred Relationships
AMPLIFIED
MATCH (r1:Troll)-[:POSTED]->(:Tweet)
<-[:RETWEETED]-(:Tweet)
<-[:POSTED]-(r2:Troll)
WHERE r1 <> r2
WITH r1,r2, count(*) as freq
MERGE (r2)-[a:AMPLIFIED]->(r1)
SET a.weight = freq
PageRank on AMPLIFIED Graph
CALL algo.pageRank('Troll', 'AMPLIFIED')
MATCH (t:Troll)
WITH t ORDER BY t.pagerank DESC LIMIT 1
MATCH path = (t)-[:AMPLIFIED*2]-()
RETURN path
PageRank on AMPLIFIED Graph
Graph Visualization
Graph Visualizations
Centrality & community detection
AMPLIFIED relationships
Node size → PageRank
Color → community detection
Rel Thickness → weight
Graph Visualization
github.com/neo4j-contrib/neovis.js
DIY?
Neo4j Drivers & Integrations
• Drivers for most
programming languages
• Bolt: binary wire protocol
• Out-of-the-box integrations for
Spring Data, GraphQL, Kafka
• Pluggable into rich data
visualization frameworks
JavaScript Java .NET Python GO, ....
Drivers
Bolt
neo4j.com/developer/language-guides/
Minimal Example: JavaScript -> Visualization
driver.session()
.run(`MATCH (n:Troll)-[:AMPLIFIED]->(m)
RETURN id(n) as source, id(m) as target`)
.then(function (result) {
const links = result.records.map(r => {
return {source:r.get('source').toNumber(),
target:r.get('target').toNumber()}});
session.close();
const ids = new Set();
links.forEach(l => {ids.add(l.source);ids.add(l.target);});
const nodes = Array.from(ids).map(id => {return {id:id}})
const graphData = { nodes: nodes, links: links};
const elem = document.getElementById('3d-graph');
ForceGraph3D()(elem).graphData(graphData);
}) medium.com/neo4j/tagged/data-visualization
Twitter Import
gist.github.com/jexp/
dc59ea550186d49e5e17ff3a08d5ec5b
Tools for Investigative
Journalists
ICIJ Datashare
• datashare.icij.org
• Local installation
• Collaboration
• Text extraction
• Entity Recognition
github.com/ICIJ/datashare
OCCRP Investigative Dashboard
Browse / Search / Visualize
OCCRP Aleph
173M entries from271 datasets aleph.occrp.org github.com/alephdata
GraphCommons
• Graph based modeling for
researchers and journalists
• Intuitive, collaborative
Graph creation
• Embedding in Websites
Encourage Sharing
What will YOU connect?
• User and Social Networks ?
• Money, Accounts, Contracts ?
• Products, Prices, Reviews, Tags ?
• Software, Dependencies, Services ?
• Machines, Devices, Sensors ?
• Genes, Proteins, Reactions ?
• Laws, Regulations ?
neo4j.com/
books
Want to learn more? - Free ebooks!
Thank you! Questions?
Learn more:
neo4j.com/developer
Me: @mesirii | @neo4j

Más contenido relacionado

Similar a How Graphs Help Investigative Journalists to Connect the Dots

Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksEC-Council
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksEC-Council
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Open Data Innovation from GEO DATA Perspective
Open Data Innovation from GEO DATA  PerspectiveOpen Data Innovation from GEO DATA  Perspective
Open Data Innovation from GEO DATA PerspectiveSerdar Temiz
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019Neo4j
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATAJ T "Tom" Johnson
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 

Similar a How Graphs Help Investigative Journalists to Connect the Dots (20)

Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael Banks
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Spark
SparkSpark
Spark
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael Banks
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Open Data Innovation from GEO DATA Perspective
Open Data Innovation from GEO DATA  PerspectiveOpen Data Innovation from GEO DATA  Perspective
Open Data Innovation from GEO DATA Perspective
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATA
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 

Más de jexp

Looming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfLooming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfjexp
 
Easing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsEasing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsjexp
 
Looming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in JavaLooming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in Javajexp
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesjexp
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?jexp
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVMjexp
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Editionjexp
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jjexp
 
A Game of Data and GraphQL
A Game of Data and GraphQLA Game of Data and GraphQL
A Game of Data and GraphQLjexp
 
Querying Graphs with GraphQL
Querying Graphs with GraphQLQuerying Graphs with GraphQL
Querying Graphs with GraphQLjexp
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Futurejexp
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metricsjexp
 

Más de jexp (20)

Looming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfLooming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdf
 
Easing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsEasing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line tools
 
Looming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in JavaLooming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in Java
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFiles
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVM
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Edition
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4j
 
A Game of Data and GraphQL
A Game of Data and GraphQLA Game of Data and GraphQL
A Game of Data and GraphQL
 
Querying Graphs with GraphQL
Querying Graphs with GraphQLQuerying Graphs with GraphQL
Querying Graphs with GraphQL
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metrics
 

Último

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxbenishzehra469
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 

Último (20)

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 

How Graphs Help Investigative Journalists to Connect the Dots

  • 1. How Graphs Help Investigative Journalists to Connect the Dots Michael.Hunger@neo4j.com YOW! Conference Australia December 2019
  • 2. (Michael Hunger)-[:WORKS_FOR]->(Neo4j) michael@neo4j.com | @mesirii Java Champion - Head of Developer Relations @Neo4j
  • 3. What enables IJ like PanamaPapers? 1. Whistleblower + Data Leak 2. Journalistic Collaboration 3. Technology to handle the data
  • 4.
  • 5. The Data Whistleblowers risking a lot to expose the truth
  • 6. Our world today • Billions of exchanges • Messages, data, transactions • Sometimes hidden in plain sight • Between people, companies, organizations, governments • Which includes criminals
  • 8.
  • 9.
  • 10.
  • 11. INSIDE THE 2.6 TBInside the 2.6 TB of Data
  • 12. The data had everything
  • 13. Mossack Fonseca – Mosfon – Panama – „Black Hole“ Jürgen Mossack – Ramon Fonseca Est. 1977 – Data from 1977 to 2015
  • 14.
  • 16. You Australians are pretty good at it
  • 17. The Collaboration The ICIJ - A global trust network
  • 18. • Individual Reporters • Working the Paper trail • Much like an detective story • Long turnaround times • Local impact • Large amounts of data were not easily shareable Investigative Journalism
  • 19. Investigative Journalism Today • Benefit from whistleblowers & leaks • Sharing a large amount of data • Use technology and data engineers • Collaborate globally (Trust!) • Corrobate suspicions with other sources locally • Affects the world at large • „Golden Age of IJ, never been as important“
  • 20. Organization of ca. 200 journalists Based in 65 countries “Our aim is to bring journalists from different countries together in teams - eliminating rivalry and promoting collaboration. Together, we aim to be the world’s best cross-border investigative team.” icij.org/about
  • 22.
  • 23. +370 journalists in 80 countries
  • 24. Panama Papers Timeline • early 2015 First contact John Doe with SZ • Spring 2015 Involving ICIJ • Summer 2015 Start of investigations • April 4. 2016 Public Launch
  • 26. Exposed the offshore holdings of 12 current and former world leaders. Dealings of 128 more politicians and public officials around the world. Exposure of hidden secrets
  • 27. Main goals, achieved: 1) Uncover the truth 2) Assure whistle blower safety
  • 28. The Tech Behind the ICIJ investigation
  • 29. INSIDE THE 2.6 TBRemember the 2.6 TB of data?
  • 30. +370 journalists +100 media organizations 80 countries 1 Year Data Team: 3 Data Journalists + 3 Developers ! Who was working on it?
  • 32. Nuix Investigator • OCR • Entity Extraction • Analytical tools • Philantrophic Donation to ICIJ Nuix is an Australian (founded 2000 Sydney) company focused on data extraction from unstructured sources.
  • 33. 3 million files x 10 seconds per file = 1 yr / 35 servers = 1.5 weeks Nuix Investigator
  • 34. Lucene syntax queries with proximity matching! 400 users
  • 36. Context is King name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara"name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 37. Context is King MENTIONS name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara" since: Jan 10, 2011 name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 38.
  • 39. Journalists say: „It‘s like Magic“
  • 40.
  • 41. Need to store and query our connections! Real, inferred and integrated
  • 42. Neo4j A native graph database • Manage and store your connected data as a graph • Query relationships easily and quickly • Evolve model and applications to support new requirements and insights • Built to solve relational pains
  • 44. NODE key: “value” properties Property Graph Model Nodes • The entities in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties RELATIONSHIP NODE NODE key: “value” properties key: “value” properties key: “value” properties$
  • 45. Neo4j: All about Patterns (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"}) KNOWS Dan Ann NODE NODE LABEL PROPERTY neo4j.com/developer/cypher LABEL PROPERTY
  • 46. Neo4j: Create Patterns CREATE (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"}) KNOWS Dan Ann NODE NODE LABEL PROPERTY neo4j.com/developer/cypher LABEL PROPERTY
  • 47. Cypher: Clauses CREATE (:Intermediary {name:“Deutsche Bank“}) -[:REPRESENTS]->(e:Entity {name:“...“}) -[:LOCATED]->(:Address {address:“...“}) -[:IN]->(:Country {name:“PAN“})
  • 48. Cypher: Find Patterns MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who KNOWS Dan ??? LABEL NODE NODE LABEL PROPERTY ALIAS ALIAS neo4j.com/developer/cypher
  • 49. Cypher: Clauses MATCH (o:Officer)-[owns]->(e:Entity)<--(a:Address) WHERE a.address CONTAINS „Sydney“ RETURN o.name, owns.shares, e.name
  • 50. Getting Data into Neo4j • Bulk Load from CSV Files • Update Graph from • Web APIs (JSON,XML) • Other Databases • CSV Files • User Activity (Logs, Callbacks) , ,,
  • 51. Import Demo – CSV dump ==> /Users/mh/Downloads/panama/import/Addresses.csv <== address,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID 27 ROSEWOOD DRIVE #16-19 SINGAPORE 737920,6991059DFFB057DF310B9BF31CC4A0E6,The Panama Papers data is current through 2015,SGP,Singapore,14000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Entities.csv <== name,original_name,former_name,jurisdiction,jurisdiction_description,company_type,address,internal_id,incorporation_date,inactivation_date, struck_off_date,dorm_date,status,service_provider,ibcRUC,country_codes,countries,note,valid_until,node_id:ID,sourceID "TIANSHENG INDUSTRY AND TRADING CO., LTD.","TIANSHENG INDUSTRY AND TRADING CO., LTD.",,SAM,Samoa,,ORION HOUSE SERVICES (HK) LIMITED ROOM 1401; 14/F.; WORLD COMMERCE CENTRE; HARBOUR CITY; 7-11 CANTON ROAD; TSIM SHA TSUI; KOWLOON; HONG KONG,1001256,23-MAR-2006, 18-FEB-2013,15-FEB-2013,,Defaulted,Mossack Fonseca,25221,HKG,Hong Kong,,The Panama Papers data is current through 2015,10000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Intermediaries.csv <== name,internal_id,address,valid_until,country_codes,countries,status,node_id:ID,sourceID "MICHAEL PAPAGEORGE, MR.",10001,MICHAEL PAPAGEORGE; MR. 106 NICHOLSON STREET BROOKLYN PRETORIA 0002; GAUTENG (PWV) SOUTH AFRICA, The Panama Papers data is current through 2015,ZAF,South Africa,ACTIVE,11000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Officers.csv <== name,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID KIM SOO IN,E72326DEA50F1A9C2876E112AAEB42BC,The Panama Papers data is current through 2015,KOR,"Korea, Republic of",12000001,Panama Papers ==> /Users/mh/Downloads/panama/import/all_edges.csv <== node_id:START_ID,rel_type:TYPE,node_id:END_ID 11000001,intermediary of,10208879
  • 52. Import Demo - Run $NEO4J_HOME/bin/neo4j-import --into $DATA/panama.db --nodes:Address $DATA/Addresses_fixed.csv --nodes:Entity $DATA/Entities.csv --nodes:Intermediary $DATA/Intermediaries.csv --nodes:Officer $DATA/Officers.csv --relationships $DATA/all_edges_header.csv,$DATA/all_edges_cleaned.csv IMPORT DONE in 20s 747ms. Imported: 839434 nodes 1253582 relationships 8211010 properties +-----------------------------+ | labels(n) | count(*) | +-----------------------------+ | ["Officer"] | 344455 | | ["Entity"] | 319150 | | ["Address"] | 151054 | | ["Intermediary"] | 23636 | +-----------------------------+
  • 53. The Basic ICIJ Data Model
  • 54. The Real ICIJ Data Model
  • 56. Data is available Data is available
  • 57. Data is available • 785,000 Offshore Leaks Companies from several investigations • For online browsing and visualization • offshoreleaks.icij.org • As CSV dump download • As Neo4j Database download • offshoreleaks.icij.org/pages/database • As Neo4j sandboxes sandbox.neo4j.com
  • 58.
  • 59. Data exposed as interactive Visualization • Public figures and leaders • Different shell companies & involvements
  • 60. Try it yourself? sandbox.neo4j.com
  • 62.
  • 63. More steps for the ICIJ and all of us • Data integration with other sources • Entity extraction • Email pattern analysis • Content & Data mining • Machine learning • Alerts with real time news / social media • Investigative recommendations • Active search for new sources ...
  • 64. Current Investigation of Mossack Fonseca Jürgen Mossack – Ramon Fonseca Arrested, free on bail, ongoing investigations
  • 65. • Flow to US tax havens • Transparancy laws UK, EU • $1.3bn taxes recouped publicly • Investigations into banks, public figures, companies • Criminal cases solved & ongoing World Wide Results of the Offshore leaks investigations icij.org/investigations/panama-papers/panama-papers-helps-recover-more-than-1-2-billion-around-the-world/
  • 67. Most recent SEB Bank Investigation in Sweden • Operations of Nordic banks In Baltic states • Danske Bank, Swedbank, SEB • Resignations, Investigations • Straw-men mentioned in reporting can be found in offshore leaks db • SEB bank in Nov 2019 !
  • 68. Murder Daphne Caruana Galizia • Maltese Investigative Reporter • Reported on Panama Papers appearances of influential Maltese Politicians • Murdered Oct 16 2017 with car bomb • „The Daphne Report“ • PM finally resigning en.wikipedia.org/wiki/Daphne_Caruana_Galizia
  • 69. Murder Jan Kuciak • Slovak Investigative Reporter • Reported on criminal behavior of businessmen • Jan & fiancé shot Feb 2018 • Massive reactions + political crisis • PM and several ministers resigned en.wikipedia.org/wiki/Murder_of_J%C3%A1n_Kuciak
  • 70. The ICIJ didn’t stop there #BahamasLeak
  • 71. Read & Watch More
  • 72. How can you investigate large complex data using Graphs ? Apply full set of available tools.
  • 73. Source: John Swain - Twitter Analytics Right Relevance Talk
  • 74.
  • 75. Russia Twitter Trolls democrats-intelligence.house.gov/uploadedfiles/exhibit_b.pdf ● 2752 Twitter accounts tied to Russia’s Internet Research Agency ● Accounts suspended by Twitter ○ Data deleted ● What were they tweeting about?
  • 77. 345k Tweets, 41k Users (454 Russian Trolls)
  • 78. Your typical American Citizen? Your typical Local News Publication? Your typical Local Political Party? @LeroyLovesUSA @TEN_GOP @ClevelandOnline
  • 79. Your typical Russian Troll Your typical Russian Troll Your typical Russian Troll @LeroyLovesUSA @TEN_GOP @ClevelandOnline
  • 80.
  • 81.
  • 84. AnnotationsNLP w/ Graph Databases NLP Process
  • 86. Graph Algorithms Gain new insights from context & topology
  • 87. Graph & ML Algorithms in Neo4j+35 neo4j.com/ graph-algorithms- book/ Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  • 88. Neo4j Native Graph Database Analytics Integrations Cypher Query Language Wide Range of APOC Procedures Optimized Graph Algorithms
  • 90. MATCH (r1:Troll)-[:POSTED]->(:Tweet) <-[:RETWEETED]-(:Tweet) <-[:POSTED]-(r2:Troll) WHERE r1 <> r2 WITH r1,r2, count(*) as freq MERGE (r2)-[a:AMPLIFIED]->(r1) SET a.weight = freq PageRank on AMPLIFIED Graph
  • 91. CALL algo.pageRank('Troll', 'AMPLIFIED') MATCH (t:Troll) WITH t ORDER BY t.pagerank DESC LIMIT 1 MATCH path = (t)-[:AMPLIFIED*2]-() RETURN path PageRank on AMPLIFIED Graph
  • 92.
  • 94. Graph Visualizations Centrality & community detection AMPLIFIED relationships Node size → PageRank Color → community detection Rel Thickness → weight
  • 96. DIY?
  • 97. Neo4j Drivers & Integrations • Drivers for most programming languages • Bolt: binary wire protocol • Out-of-the-box integrations for Spring Data, GraphQL, Kafka • Pluggable into rich data visualization frameworks JavaScript Java .NET Python GO, .... Drivers Bolt neo4j.com/developer/language-guides/
  • 98. Minimal Example: JavaScript -> Visualization driver.session() .run(`MATCH (n:Troll)-[:AMPLIFIED]->(m) RETURN id(n) as source, id(m) as target`) .then(function (result) { const links = result.records.map(r => { return {source:r.get('source').toNumber(), target:r.get('target').toNumber()}}); session.close(); const ids = new Set(); links.forEach(l => {ids.add(l.source);ids.add(l.target);}); const nodes = Array.from(ids).map(id => {return {id:id}}) const graphData = { nodes: nodes, links: links}; const elem = document.getElementById('3d-graph'); ForceGraph3D()(elem).graphData(graphData); }) medium.com/neo4j/tagged/data-visualization
  • 99.
  • 102. ICIJ Datashare • datashare.icij.org • Local installation • Collaboration • Text extraction • Entity Recognition github.com/ICIJ/datashare
  • 103. OCCRP Investigative Dashboard Browse / Search / Visualize
  • 104. OCCRP Aleph 173M entries from271 datasets aleph.occrp.org github.com/alephdata
  • 105. GraphCommons • Graph based modeling for researchers and journalists • Intuitive, collaborative Graph creation • Embedding in Websites Encourage Sharing
  • 106. What will YOU connect? • User and Social Networks ? • Money, Accounts, Contracts ? • Products, Prices, Reviews, Tags ? • Software, Dependencies, Services ? • Machines, Devices, Sensors ? • Genes, Proteins, Reactions ? • Laws, Regulations ?
  • 107. neo4j.com/ books Want to learn more? - Free ebooks!
  • 108. Thank you! Questions? Learn more: neo4j.com/developer Me: @mesirii | @neo4j