Neo4j is a graph database (nodes and relationships) and is the perfect fit for some types of problem. Within that domain Neo4j is much, much faster than SQL and easier to query. Py2neo is a Python binding to Neo4j. The live presentation showed how to create word transformation puzzles e.g. getting from "stores" to "slaked" by one latter transformations where each intermediate step is a valid word. One solution is "stores"->"stored"->"stared"->"staked"->"slaked".
2. Overview
●
Brief look at graph databases & Neo4j
●
Introduction to word transformation game
●
Getting suitable words
●
Adding words and relationships into Neo4j
●
Querying graph data to generate puzzles
3. Graph Databases – a NoSQL option
http://neo4j.com/books/graph-databases/
4. NoSQL – when is it a good fit?
●
SQL has its origins in the 1970s
and may not be fresh and shiny
any more but ...
●
… we shouldn't choose NoSQL
for reasons of fashion.
●
Venerable SQL often a better
choice for standard hierarchies
e.g. countries that have cities
that have suburbs etc
6. Graph Databases
●
Graph databases much, much better for related data with:
– lots of different links between same nodes
– different numbers of links between nodes
e.g. 3 hops to one peer and 7 hops to another
– lots of peer-to-peer links
7. Substantial Benefits
●
Massive performance benefits (going exponential as number
of links grows)
●
Structural harmony
– between structure of data and structure of data storage
(what you draw on the whiteboard might look very similar
to how you data is actually structured)
– between questions of data and query language used to
answer them
8. Word transformations
●
Start with one word and get to
the other by single-letter
tranformations word-by-word
●
E.g. starting with “stores” get to
“slaked”
– BTW there are 96 alternative
ways 5 moves or less
stores
stored
stared
staked
slaked
9. Puzzle taster
Get from 'sloven' to 'closed' in
no more than 5 steps
(there are 10 unique solutions)
sloven
?
closed
10. Getting a simple word list
●
How hard could it be?
●
Lesson #1 – scrabble lists and similar are useless – only want lists
with standard words otherwise puzzles too hard
●
Lesson #2 – have to decide about taboo/profane words
●
Lesson #3 – the number of words affects the number of
ONE_LETTER_DIFF relationships a lot
●
Lesson #4 – clever optimisation not needed if restricting self to
ordinary words
SCOWL (Spell Checker Oriented Word Lists) http://wordlist.aspell.net/
11. Filtering words
●
Needed to turn é to e
●
Needed to eliminate possessives e.g. cat's (as used in the phrase “the
cat's whiskers”)
●
Needed to leave out capitalised words
12. For each word, identifying words different
by one letter only
Disclaimer: the code worked but probably some super-smart optimisations
would be possible involving n-dimensional space or something
13. Adding data to Neo4j
●
Create nodes and relationships
●
Lots of room for optimisations
●
Only need to build database once so 15 minutes is not worth
reducing
●
My Neo4j and Py2neo is beginner level but I was able to solve my
problem