This document discusses document classification using graphs and Neo4j. It introduces hierarchical pattern recognition (HPR) for graph-based document classification. HPR learns deep feature representations in a hierarchy using finite state machines. The features are mapped to a vector space model for classification. The document demonstrates HPR by classifying US presidential speeches by political affiliation, achieving over 70% similarity for predicted vs actual labels. It encourages attendees to get involved in the Neo4j community.
When we think about data, we tend to think about how things are connected. This is a natural part of how we talk about things, and also of the graph model.
“This is also a graph, but with some data attached. Here: we’ve attached names to the nodes and described the type of the relationships.”
“We can take this further, and attach arbitrary key/value pairs”
This is the Property Graph Model, which has the following characteristics:
It contains Nodes and Relationships, both of which can contain properties (key-value pairs).
Relationships are always between exactly 2 nodes. They have a type, and they are directed.
“There are other graph models, however everyone in the industry has converged on the idea that this model is the most obvious and the most useful for real humans and the application we’re building”
Let’s review the relational table model, to see the difference from the graph property model
Start with Customers and Accounts
“We have a customer, Alice.”
“She’s got 3 accounts”
“To keep track of which accounts Alice owns, we need a 3rd table, to store the mapping. Typically called a join table.”
Dashboard, for monitoring of key stats
Node, Relationship and Property “counts” are just estimates (actually represent the allocated ID space for each graph entity)
“The Console is where you can run graph queries, written in Cypher.”
We’ll be using this starting... now.
Disclaimer: This is a graph-based approach to text classification and pattern recognition. This can be done in many different ways, including SVM, bayesian networks, belief networks, and many other approaches. I chose to create this on top of Neo4j because first its a database and second its already formatted as a network. This gives me the advantage of not worrying about data storage.
Explain how the genetic algorithm works.
I chose this example project because it’s easy to get presidential speeches online and it seemed like a good example to get others going with Graphify.
“Get involved with the community, attend meetups, browse our open source code libraries, including Neo4j, by visiting us on GitHub.”
“Visit stackoverflow.com with the tag Neo4j to get fast answers to your questions. We have a very active community of contributors that provide thorough answers 24/7. If you get stuck, make sure you head there.”
“The same goes for Google groups, if you prefer that format over Stackoverflow.”
“You can visit us on GitHub to submit or browse issues.”
“Finally, I urge you to check out our website’s meetup page to find out where meetups are happening all around the world. Also we encourage you to share your experience with Neo4j, your applications, and your use cases by speaking at a local meetup. If you’re interested, please reach out to me, my contact details are in the next slide.”
“Thank you for spending some time with me and learning about Neo4j and Cypher.”
“Get in touch with me about meetups and Neo4j community events happening around the world.”
“I’ll now open up the floor to questions.”