When dealing with datasets, journalists have many options to choose from when moving beyond Excel. Usually the first step is using a relational (or SQL) database. While a relational database can be a good choice for some datasets, data analysts today turn to new tools to gain deeper insight. This talk will show how we can use a graph database to analyze highly connected data using examples from U.S. Congressional data and political email archives. Using the U.S. Congress data, we’ll show you how to explore the dataset using Cypher, the Neo4j query language, to discover legislator activity including bill sponsorship and voting activity. Building up our knowledge of Cypher as we progress, we’ll show how you can use principles from social network analysis to find influential legislators and discover what topics legislators have influence over. Finally, we will examine how to draw insights from the Hillary Clinton email dataset, released as part of a FOIA request earlier this year. We will explore this dataset as a graph of interactions among users, answering questions like: Who is communicating with Hillary the most? What are the topics of these emails? You’ll learn how to visualize these using the Neo4j browser to quickly make sense of the data as we are exploring.
The goal of this talk is to provide a demonstration of database tools that any journalist can use to explore datasets and draw insights from connected datasets.
23. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
27. Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON
33. https://github.com/legis-graph/legis-graph
LOAD CSV WITH HEADERS
FROM “file:///legislators.csv” AS line
MERGE (l:Legislator (thomasID: line.thomasID})
SET l = line
MERGE (s:State {code:line.state})<-[:REPRESENTS]-(l)
…
US Congress
59. Content mining
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda