6. HOW SERENDIPITY HELPS
• Many new inventions occur because related information
crosses conventional boundaries, leaving it’s ghetto.
• Ourlives are made richer by discovering ideas and
experiences outside our comfort zones and habitual patterns.
• Serendipity
accelerates information discovery by making new
and unexpected connections.
9. WHO NEEDS SERENDIPITY?
• B2B Sites - encourages businesses to find ways of collaborating they may
never have thought of.
• Social sites - let people discover new friends and new interests.
• Collaborative software - find projects that could work together in
unexpected ways.
• Document management - find documents that help you look at your work
in a different way?
• Contact management - find new people who you could do business with
that might not be in a narrowly defined field.
20. GET CONNECTED
• Contextually isolated systems only show us information regarding a closed set of
data and activities.
• Semantically isolated systems only show us information which is similar to other
information.
• Content connected systems show us data that relates to each other which can
crosses weakening contextual and semantic boundaries.
• Socially connected systems show us information regarding our friends and their
activities, weakening contextual and semantic boundaries.
• Highly connected systems show us information with n-degrees of separation and
multiple paths across contextual and semantic boundaries.
37. RDMS VS GRAPH
• Highly
connected systems can be modelled relatively easily on
an RDMS, but adding new relationships creates complexity
and must be planned in advance.
• Queryingis easier for semantically and contextually isolated
models on an RDMS.
• Querying is extremely messy (indeed!) for highly connected
models.
41. RDMS VS GRAPH
• Multiple
hop queries are horrific under an RDBMS in both
performance pitfalls and legibility of queries.
• Graph databases love multiple hop logic and one can say
thrive upon it. It’s much easier to find out related items
through arbitrary degrees of separation and semantic barriers.
43. WEIGHT & FILTER
• Proximitystill matters, information should be closely
connected if not semantically or contextually related.
• Relevancy should relate to frequency.
• Filtering
can be done manually by users choosing what to
recommend or pass on.
• If possible use customer feedback to adjust weighting.
44. RDMS VS GRAPH
• RDMS cannot categorise relationships independently of the
content for example ‘like’, ‘owns’, ‘has viewed’.
• RDMS cannot add meta-data to the relationship to help
ranking of the relevancy.
• Graph databases can do both these and can quickly calculate
the cost of traversing to an item of content.
57. RE-TWEETS
• Re-tweets allow rapid dissemination of information beyond a
limited social group, they cross semantic and contextual
boundaries.
• Re-tweets can be (and are often) re-tweeted, allowing multiple
hops.
• Other Twitter users act as the filters, and we further weight by
reputation.
59. WHAT SERENDIPITY ISN’T!
• Random; random combinations of information are just noise.
putting teflon on a dolphin’s nose would not be a useful
contribution to society. Don’t confuse unexpected with random!
• Accidental; serendipitycomes from an attentive, and often
intuitive mind receiving diverse information.
• Luck; serendipity
is a cognitive process that creates new
connections between previously unrelated concepts and realises
the value in them.
60. THREE STEPS TO SERENDIPITY
• Remove Isolation. Relationships are low cost and can be
added to data at any point, so create them and create as many
as possible ignoring contextual or semantic boundaries.
• UseMultiple Hops. Cross semantic and contextual
boundaries when providing relevancy.
• Weight and Filter. The value of the information found
should relate to the route traversed. Allow users to manually
pass on information to others.
61. CODING SERENDIPITY
How can we add serendipity into our systems?
• Information must be able to travel freely between users.
• Information should be able to travel multiple levels of
indirection with ease.
• Information
should have the maximum number of inter-
connections across semantic boundaries.
• Information
relationships should be categorised and potentially
contain meta-data required for weighting.
62. HOW NEO4J HELPS
• Relationships
are created trivially at low cost at any time with
no regards to semantic boundaries.
• Connected information over many hops can be retrieved
quickly using Node#traverse or the Traversal framework.
• Relationships
can have both types and properties making
weight and filter calculations easy.
63. TAKE AWAY
• Create more relationships.
• Let information cross contextual and semantic boundaries.
• Make sure relevancy is probabilistic, not deterministic.
• Serendipity is not accidental, random or lucky!
• Themore heterogeneous and connected your data becomes,
the more you should consider Neo4j.
65. AUTOMATIC WEIGHT&FILTER
• Sum the ‘weight’ of each relationship traversed to the node.
• Find a random number between 0 and that weight.
• Order the discovered nodes by this random value.
• Choose the nodes with the nth lowest values.
• Byusing random numbers we increase serendipity without
sacrificing relevance.
67. OTHER EXAMPLES
• Research papers are a semantically arranged collection of information and
therefore create semantic isolated areas of information.
• A lending library is another semantically isolated collection of information.
• A project management website creates a contextually isolated set of
information.
• The internet is a highly connected disorganised information storage system
- which leads to a fair amount of serendipity. How many interesting things
have you ‘stumbled upon’ on the internet, but it still has a tendency to have
semantic or contextual silos. There’s still a lot of room for improvement.
Notas del editor
\n
\n
\n
\n
\n
\n
\n
We’ll come back to these forms a little later.\n
\n
So how can we encourage serendipity?\n
\n
Semantically related information such as science books, art books and cookery books are unlikely to refer to each other, keeping the information isolated by it’s semantics. When these boundaries are crossed we get some of the inventions we saw earlier.\n
Contextually isolated information is separated by the context the information was created in; i.e. it belongs to a single user, a single team, company, project. Anything that links information together into a closed network. When scientists, companies, teams and people communicate their work or interests great things also happen.\n
The internet broke away from these two information ghettos by joining documents together on the internet, so our information could be connected.\n
We’ve now moved forward into the socially connected era where our systems now encourage the spread of information by users, we share, recommend and forward.\n
\n
But we can go a stage further, highly connected systems need to not just connect information but people and information in arbitrary combinations - further more we need to allow this information to travel in real time across these links. \n
History shows that when we allow information to flow fast and freely in society we see revolutions in science and spirituality. As our collective understanding increases so does the welfare of the individual and society. So it is with information systems, by increasing the flow of information we increase the value to all those using it.\n
Whenever information doesn’t flow, ignorance takes over and clearly we all suffer for that.\n
So recommendation number one, increase connectivity.\n
But our storage systems affect how connected we make the world\n
File based systems basically encourages us to dump stuff together, but don’t encourages us to think how it interconnects. So we end up seeing the world as ....\n
\n
\n
Relational databases help us to organise and connect related information in a highly organised formal manner, like ....\n
\n
\n
\n
Whereas graph databases or more like ....\n
\n
\n
\n
\n
\n
\n
\n
\n
We also need data to escape it’s ghettos and a way we can do this is to potential allow information to travel arbitrary degrees of separation, for example like emails or tweets. Not just manually like viral marketing, but also automatically - in status updates, suggested content etc.\n
We see this already in recommend a friend....\n
Or related documents, but the key here is to allow multiple hops across all boundaries, semantic and contextual.\n
Multiple hop queries are horrific under an RDBMS in both performance pitfalls and legibility of queries. This is the main reason RDMS systems rarely help the spread of information by automatic means and rely on users passing on information instead.\n\n
But we don’t want just any old information, we still need to filter according to relevancy.\n
But the key I believe when automating relevancy is not to use relevancy as a fixed one off judgement on whether something is visible or not, rather to use it as an indicator of the likelihood the information will be visible.\n
\n
\n
\n
\n
\n
\n
In a semantically isolated example, books would be written about how teflon helps in fishing. Meanwhile frying pans are only of interest to the catering industry and would not have references to fishing equipment.\n
In a contextually isolated system Marc would have been busy using teflon for his fishing equipment and never mentioned it to his wife.\n
Luckily they talked to each other.\n
Now in this system which is not at this point highly connected information was able to travel multiple hops as Marc discussed his fishing equipment and his wife saw the potential application \n
\n
Now we have a highly connected system that has crossed social and semantic boundaries, how long did it take before we had teflon baking trays, cake tins etc. Once a semantic boundary has been broken the process accelerates and the speed at which other boundaries are broken increases.\n
Re-tweets traverse a graph with ‘n’ degrees of separation I can be looking at how to increase the viral nature of my new startup. When I notice a tweet about the use of landing pages - which leads me to write a viral landing page. Such a collaboration is serendipitous, it is unintentional but beneficial and rewarding.\n\n
Re-tweets allow rapid dissemination of information beyond a limited social group. Because of the 5 degrees of separation on Twitter, a single tweet can reach the entire 200 million user base within minutes. As shown by Osama Bin Laden’s death.\n
Please can you swap forms with one other person .... now the information on those forms is closely related to you because most of the people in the room have similarity in the backgrounds. However it’s outside of your pre-defined social group and the common semantical links between people here. For your homework I’d like you to watch that movie, listen to that music and take a look at that technology!\n
\n
-- Weight and Filter -> Whether they recommend, make favourite lists or send as a message. Maintain the source of the information for future automatic recommendations. Keep it connected.\n