4. OK, graph databases
• Instead of tables and SQL
• Nodes and relationships
• Specialized queries
• Not everything is a graph
(and this is not sponsored)
6. Step 0 - installing
• Install Neo4j - neo4j.com/install
• brew on Mac
• DigitalOcean has Linux instructions
• change default password
• Trouble installing locally?
• heroku addons:add graphene
7. Who uses graphs?
• Panama Papers
• IMDB / Six Degrees of Kevin Bacon
• Especially:
• social networks, research data, maps
• anywhere number of joins is large, indefinite,
or unlimited
12. The trouble with tables
• Many joins to get people, titles, photos,
additional relationship info
• Speed of query
• Difficult to write new queries
13. Art Graph DB
• did Picasso collaborate with other artists
in his lifetime?
• are any artists credited as painter,
director, sculptor, etc?
(maybe an art EGOT)
14. Let’s build that graph
• Artists and artworks
• Basic bio data, MoMA ID -> Artist node
• Future DB: all people connected
• Title, date, MoMA ID -> Artwork node
• ARTIST_OF relationship (include order)
15. Let’s build that graph
• git clone
https://github.com/mapmeld/graph
!
• Building a scraper for MoMA
21. If you’re interested
• Google: MapZen Extracts
• download a city
• for this script, download the OSM XML file
• if you like PostGIS, there is a download
(no import script)
22. Benefits of OSM
• Open to use / full data
• Open to edit / choose tags
• HOT community
• Civil e-mail lists (Crimea)
24. Google on OSM
• "Our maps represent
what you or I need to do on a day-to-day
basis
in the developed part of the world”
• — Google Maps Geospatial Technologist
(quoted in FastCompany)
28. XML data
• Nodes, ways, and relations
• Ways made up of multiple nodes
• Relations contain nodes and ways
• Practically:
• Multiple ways connect / combine
• Tags are a community construct
29. Smart Renderer
• When is a <way> a line (cul-de-sac) or a
polygon (river, lake, parking lot)?
• Has to support world’s fonts
• Tag for real life, not for the renderer
30. Building graph data
• Script adds all roads to Neo4j
• Includes an array of node ids (can mix content
types, similar to a document database)
• If two ways share a node with the same ID, link
them both ways <—>
33. Google Prediction API
• Prediction based on a CSV
• Categorization or numerical
• Google generates a model and estimates
accuracy
• Not allowed in Myanmar
34. Predicting Houses
• Format 60,000+ rows of database export
• Choose categories to predict 2-3 years
• Competing models determine how important
each column is
• Can it parse dates? Find patterns
• Edging up to ~74 percent accuracy
35. Network effect
• Adding network of streets
• Now tokens include not
just my street and
neighbors, but shared
streets
36. Network effect
• Most demolitions have one house on their street
demolished (it’s them)
38. Network effect
• Google Prediction API reported 81% accuracy
• But is it good?
• Early optimization studies moved fire stations
and left neighborhoods vulnerable
• City can’t maintain it… hasn’t continued to
open their data
39. Looking forward
• Ideas for graph databases?
Ways to release large graph data - as an API?
As JSON files? As Neo4j dump?
• Ideas for statisticians / future research?