Serendipity-neo4j

THANK YOU FOR BEING
my guinea pigs today

SERENDIPITY
How do I ﬁnd what I am not looking for?

Serendipity: the occurrence and development of
events by chance in a happy or beneﬁcial way

HOW SERENDIPITY HELPS

• Many new inventions occur because related information
crosses conventional boundaries, leaving it’s ghetto.

• Ourlives are made richer by discovering ideas and
experiences outside our comfort zones and habitual patterns.

• Serendipity
accelerates information discovery by making new
and unexpected connections.

Serendipity has lead to an incredible amount of discoveries.

A PRACTICAL EXERCISE
Please ﬁll in the forms I handed out.

WHO NEEDS SERENDIPITY?
• B2B Sites - encourages businesses to find ways of collaborating they may
never have thought of.

• Social sites - let people discover new friends and new interests.

• Collaborative software - find projects that could work together in
unexpected ways.

• Document management - find documents that help you look at your work
in a different way?

• Contact management - find new people who you could do business with
that might not be in a narrowly defined field.

Science Books
Art Books

Cookery Books

SEMANTICALLY ISOLATED

User

Blog Posts

Documents

CONTEXTUALLY ISOLATED

Documents

CONTENT CONNECTED

Users
Documents

Blog Posts Documents

Blog Posts

SOCIALLY CONNECTED

Documents

Users

Blog Posts

HIGHLY CONNECTED

GET CONNECTED
• Contextually isolated systems only show us information regarding a closed set of
data and activities.

• Semantically isolated systems only show us information which is similar to other
information.

• Content connected systems show us data that relates to each other which can
crosses weakening contextual and semantic boundaries.

• Socially connected systems show us information regarding our friends and their
activities, weakening contextual and semantic boundaries.

• Highly connected systems show us information with n-degrees of separation and
multiple paths across contextual and semantic boundaries.

OUR STORAGE SYSTEMS
AFFECT HOW CONNECTED
WE MAKE THE WORLD

FILE BASED STORAGE SEES THE
WORLD AS A SET OF NESTED
COLLECTIONS OF ISOLATED
INFORMATION

RELATIONAL DATABASES AS HIGHLY
ORGANISED COLLECTIONS OF
INFORMATION
WHICH INTERSECT

AND GRAPH DATABASES AS
DISORGANISED BUT HIGHLY
INTERCONNECTED DATA

RDMS VS GRAPH

• Highly
connected systems can be modelled relatively easily on
an RDMS, but adding new relationships creates complexity
and must be planned in advance.

• Queryingis easier for semantically and contextually isolated
models on an RDMS.

• Querying is extremely messy (indeed!) for highly connected
models.

User

User

Documents

RECOMMEND A FRIEND

User

Documents

YOU MIGHT ALSO LIKE

RDMS VS GRAPH

• Multiple
hop queries are horriﬁc under an RDBMS in both
performance pitfalls and legibility of queries.

• Graph databases love multiple hop logic and one can say
thrive upon it. It’s much easier to ﬁnd out related items
through arbitrary degrees of separation and semantic barriers.

WEIGHT & FILTER

• Proximitystill matters, information should be closely
connected if not semantically or contextually related.

• Relevancy should relate to frequency.

• Filtering
can be done manually by users choosing what to
recommend or pass on.

• If possible use customer feedback to adjust weighting.

RDMS VS GRAPH

• RDMS cannot categorise relationships independently of the
content for example ‘like’, ‘owns’, ‘has viewed’.

• RDMS cannot add meta-data to the relationship to help
ranking of the relevancy.

• Graph databases can do both these and can quickly calculate
the cost of traversing to an item of content.

TEFLON FRYING PANS:
SERENDIPITY IN ACTION

Marc Grégoire Mme. Grégoire

INVENTED BY MARC GREGOIRE
AT THE BEHEST OF HIS WIFE

Marc Grégoire

PTFE

MARC USED PTFE ON HIS TACKLE

Mme. Grégoire

HIS WIFE WANTED PANS THAT DIDN’T STICK

PTFE

SEMANTICALLY ISOLATED


PTFE

CONTEXTUALLY ISOLATED


PTFE

SOCIALLY CONNECTED


PTFE

MULTIPLE HOPS

Mme. Grégoire

PTFE

SERENDIPITY


PTFE

HIGHLY CONNECTED SYSTEM

RE-TWEETS

• Re-tweets allow rapid dissemination of information beyond a
limited social group, they cross semantic and contextual
boundaries.

• Re-tweets can be (and are often) re-tweeted, allowing multiple
hops.

• Other Twitter users act as the ﬁlters, and we further weight by
reputation.

HAVE YOU FILLED IN YOUR
FORMS?

WHAT SERENDIPITY ISN’T!

• Random; random combinations of information are just noise.
putting teﬂon on a dolphin’s nose would not be a useful
contribution to society. Don’t confuse unexpected with random!

• Accidental; serendipitycomes from an attentive, and often
intuitive mind receiving diverse information.

• Luck; serendipity
is a cognitive process that creates new
connections between previously unrelated concepts and realises
the value in them.

THREE STEPS TO SERENDIPITY

• Remove Isolation. Relationships are low cost and can be
added to data at any point, so create them and create as many
as possible ignoring contextual or semantic boundaries.

• UseMultiple Hops. Cross semantic and contextual
boundaries when providing relevancy.

• Weight and Filter. The value of the information found
should relate to the route traversed. Allow users to manually
pass on information to others.

CODING SERENDIPITY
How can we add serendipity into our systems?

• Information must be able to travel freely between users.

• Information should be able to travel multiple levels of
indirection with ease.

• Information
should have the maximum number of inter-
connections across semantic boundaries.

• Information
relationships should be categorised and potentially
contain meta-data required for weighting.

HOW NEO4J HELPS

• Relationships
are created trivially at low cost at any time with
no regards to semantic boundaries.

• Connected information over many hops can be retrieved
quickly using Node#traverse or the Traversal framework.

• Relationships
can have both types and properties making
weight and ﬁlter calculations easy.

TAKE AWAY

• Create more relationships.

• Let information cross contextual and semantic boundaries.

• Make sure relevancy is probabilistic, not deterministic.

• Serendipity is not accidental, random or lucky!

• Themore heterogeneous and connected your data becomes,
the more you should consider Neo4j.

@neilellis
neilellis@cazcade.com

AUTOMATIC WEIGHT&FILTER

• Sum the ‘weight’ of each relationship traversed to the node.

• Find a random number between 0 and that weight.

• Order the discovered nodes by this random value.

• Choose the nodes with the nth lowest values.

• Byusing random numbers we increase serendipity without
sacriﬁcing relevance.

MANUAL WEIGHT&FILTER

• Re-Tweeting or forwarding.

• Tell a friend.

• Like.

• etc.

OTHER EXAMPLES

• Research papers are a semantically arranged collection of information and
therefore create semantic isolated areas of information.

• A lending library is another semantically isolated collection of information.

• A project management website creates a contextually isolated set of
information.

• The internet is a highly connected disorganised information storage system
- which leads to a fair amount of serendipity. How many interesting things
have you ‘stumbled upon’ on the internet, but it still has a tendency to have
semantic or contextual silos. There’s still a lot of room for improvement.

Serendipity-neo4j

Recomendados

Recomendados

Más contenido relacionado

Similar a Serendipity-neo4j

Similar a Serendipity-neo4j (20)

Más de Skills Matter

Más de Skills Matter (20)

Serendipity-neo4j

Notas del editor