SlideShare a Scribd company logo
1 of 75
Graphs, Edges &
    Nodes
  Untangling the social web.
What’s a graph?
Graph
Graph
Graph
Graph
                                  10


                          19
                     9                  7
                                                    2   15
                 7
         3
                         12
                                       13
                                                9
    6
             6
                                  4                          3
                     5                  7
4
        14
                                            1

                              4
Graph
                                   11                      10                    10

                                                   19
                 6                      9                            7
                                                                                 2    15
                                   7         21
                      3                                                  8
                                                  12
                                  15                                13                     13
                17                                                           9
                                                  22
        6
                          6
                                                                             3
4                                                          4                               3
                              2         5                            7
    4
            6        14                                         9                     12
                                                                         1
                                        10             4
                                                                    19
Simple




At most one edge bet ween any pair of nodes.
Multigraph




Multiple edges bet ween vertices allowed.
Pseudograph




Self-loops are permitted.
G = (V, E)
What’s a node?
       vertex
        point
      junction
     0-simplex
What’s an edge?
        arc
      branch
        line
        link
     1-simplex
Directed
Undirected
Undirected
Visualizations
You are here.
(Graph does not include Justin Bieber)
Social Graphs
Find the band that is most often co-listened with the given one.
People




Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
People




                           Bands


Find the band that is most often co-listened with the given one.
Basically, most kinds of simple
content/co-occurrence similarity.
That’s a 2-step path on a bipartite graph.

There are many of these ‘fundamental’
graph units:

 - tripartite
 - folksonomies (tripartite 3-graph + 2-
 step path)
 - multicolor-multiparity graph
 - etc.
Graph Storage
   Engines
Neo4j
“An embedded, disk-based, fully transactional Java persistence engine
    that stores data structured in graphs rather than in tables.”

                         http://neo4j.org
HypergraphDB
  “A general purpose, extensible, portable, distributed, embeddable,
open-source data storage mechanism. It is a graph database designed
   specifically for artificial intelligence and semantic web projects.”

                   http://kobrix.org/hgdb.jsp
Special Purpose
Storage Engines
FlockDB
 “FlockDB is a database that stores graph data, but it isn't a database optimized for
  graph-traversal operations. Instead, it's optimized for very large adjacency lists,
              fast reads and writes, and page-able set arithmetic queries.”



http://engineering.t witter.com/2010/05/introducing-
                      flockdb.html
Redis
“Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings,
exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be
manipulated with atomic operations to push/pop elements, add/remove elements, perform
server side union, intersection, difference bet ween sets, etc.”



                       http://code.google.com/p/redis
A Redis Friends/
Followers Example
Redis makes you think in terms of datastructures,
       and operations on those structures.
Set:
 Finite (for our cases) collection of objects in which
 order has no significance and multiplicity is generally
 ignored.
                  S = { Alice, Bob, Carol }

List:
  Finite (for our cases) collection of objects in which
  order *is* significant and multiplicity is allowed.
                    L = [ X, Y, X, Z, Q]
Insert a user into a set

SET uid:1000:username jperras
SET uid:1000:password bazinga!
Use sets for denoting my followers/people
                 I follow.


uid:1000:followers => Set of uids of all the followers users
uid:1000:following => Set of uids of all the following users
Adding a new follower

SADD uid:1000:following 1001
SADD uid:1001:followers 1000
Posting Updates

$r = Redis();
$postid = $r->incr("global:nextPostId");
$post = $User['id'] ."|". time() ."|". $status;
$r->set("post:$postid", $post);
$followers = $r->smembers("uid:".$User['id'].":followers");

if ($followers === false) $followers = Array();
$followers[] = $User['id']; /* Add the post to our own posts too */

foreach($followers as $fid) {
    $r->push("uid:$fid:posts", $postid, false);
}
# Push the post on the timeline, and trim the timeline to the
# newest 1000 elements.
$r->push("global:timeline", $postid, false);
$r->ltrim("global:timeline",0,1000);
Common followers? - Set intersections!




SINTER users:1000:followers users:1000:followers
Let’s compare that
     to MySQL
Can be Painful
Even More Pain
Relational databases can work for the simplest
of cases, but fail horribly at nearly all graph-related
               operations/algorithms.
Graphs and graph-databases are only
  going to be more and more useful.
However, graph algorithms are hard.

            So don’t write your own.

And make sure you use a persistent storage engine
   that is best suited for the type of queries
             you will be performing.
Resources
Resources
The Algorithm Design Manual,
Steve S. Skiena
Programming Collective
Intelligence, Toby Segaran
Introduction to Algorithms,
Cormen, Leiserson, Rivest
@jperras
Photo Credits


Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what-
is-internet-lookslike/ (built from partial troll of public servers using traceroute)

My real friends for letting me use their Facebook profile images.
References

Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of
Mathematics at St. Petersburg

http://mathworld.wolfram.com/Set.html

Programming Collective Intelligence, Toby Segaran

The Algorithm Design Manual, Steve S. Skiena

More Related Content

Similar to Graphs, Edges & Nodes - Untangling the Social Web

Drc2011 chicago my problems with insights_optimized
Drc2011 chicago my problems with insights_optimizedDrc2011 chicago my problems with insights_optimized
Drc2011 chicago my problems with insights_optimizedLuis Arnal
 
26 10 circulo con respuestas
26 10 circulo con respuestas26 10 circulo con respuestas
26 10 circulo con respuestasnikolait_es
 
Ch2.8 Display Data
Ch2.8 Display DataCh2.8 Display Data
Ch2.8 Display Datamdicken
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.
 
Back To The Future.Key 2
Back To The Future.Key 2Back To The Future.Key 2
Back To The Future.Key 2gueste8cc560
 
Day 15 graphing lines stations
Day 15 graphing lines stationsDay 15 graphing lines stations
Day 15 graphing lines stationsErik Tjersland
 
Mister Maker Animal Mask A4 Template Sheet
Mister Maker Animal Mask A4 Template SheetMister Maker Animal Mask A4 Template Sheet
Mister Maker Animal Mask A4 Template SheetCastle Hill Crafts
 
Section 7 practice b
Section 7 practice bSection 7 practice b
Section 7 practice bjslloyd23
 
2010-Pregel
2010-Pregel2010-Pregel
2010-Pregelbinzhao
 
Web I - 07 - CSS Frameworks
Web I - 07 - CSS FrameworksWeb I - 07 - CSS Frameworks
Web I - 07 - CSS FrameworksRandy Connolly
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Hike Theater Streamer資料
Hike Theater Streamer資料Hike Theater Streamer資料
Hike Theater Streamer資料LinktTheater
 
Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...butest
 
Day 20 unit rate graphs day 2
Day 20 unit rate graphs day 2Day 20 unit rate graphs day 2
Day 20 unit rate graphs day 2Erik Tjersland
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcDavid Gleich
 
OOD - Object orientated design
OOD - Object orientated designOOD - Object orientated design
OOD - Object orientated designRuberto Paulo
 
Back To The Future
Back To The FutureBack To The Future
Back To The FutureBill Scott
 

Similar to Graphs, Edges & Nodes - Untangling the Social Web (20)

Drc2011 chicago my problems with insights_optimized
Drc2011 chicago my problems with insights_optimizedDrc2011 chicago my problems with insights_optimized
Drc2011 chicago my problems with insights_optimized
 
26 10 circulo con respuestas
26 10 circulo con respuestas26 10 circulo con respuestas
26 10 circulo con respuestas
 
Ch2.8 Display Data
Ch2.8 Display DataCh2.8 Display Data
Ch2.8 Display Data
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 
Back To The Future.Key 2
Back To The Future.Key 2Back To The Future.Key 2
Back To The Future.Key 2
 
Day 15 graphing lines stations
Day 15 graphing lines stationsDay 15 graphing lines stations
Day 15 graphing lines stations
 
Mister Maker Animal Mask A4 Template Sheet
Mister Maker Animal Mask A4 Template SheetMister Maker Animal Mask A4 Template Sheet
Mister Maker Animal Mask A4 Template Sheet
 
Section 7 practice b
Section 7 practice bSection 7 practice b
Section 7 practice b
 
2010-Pregel
2010-Pregel2010-Pregel
2010-Pregel
 
Info vis 12-2012-v17-shneiderman
Info vis 12-2012-v17-shneidermanInfo vis 12-2012-v17-shneiderman
Info vis 12-2012-v17-shneiderman
 
Web I - 07 - CSS Frameworks
Web I - 07 - CSS FrameworksWeb I - 07 - CSS Frameworks
Web I - 07 - CSS Frameworks
 
Group 1 2
Group 1 2Group 1 2
Group 1 2
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Hike Theater Streamer資料
Hike Theater Streamer資料Hike Theater Streamer資料
Hike Theater Streamer資料
 
d3Kit
d3Kitd3Kit
d3Kit
 
Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...
 
Day 20 unit rate graphs day 2
Day 20 unit rate graphs day 2Day 20 unit rate graphs day 2
Day 20 unit rate graphs day 2
 
Graph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimcGraph libraries in Matlab: MatlabBGL and gaimc
Graph libraries in Matlab: MatlabBGL and gaimc
 
OOD - Object orientated design
OOD - Object orientated designOOD - Object orientated design
OOD - Object orientated design
 
Back To The Future
Back To The FutureBack To The Future
Back To The Future
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Graphs, Edges & Nodes - Untangling the Social Web

  • 1. Graphs, Edges & Nodes Untangling the social web.
  • 6. Graph 10 19 9 7 2 15 7 3 12 13 9 6 6 4 3 5 7 4 14 1 4
  • 7. Graph 11 10 10 19 6 9 7 2 15 7 21 3 8 12 15 13 13 17 9 22 6 6 3 4 4 3 2 5 7 4 6 14 9 12 1 10 4 19
  • 8. Simple At most one edge bet ween any pair of nodes.
  • 9. Multigraph Multiple edges bet ween vertices allowed.
  • 11. G = (V, E)
  • 12.
  • 13. What’s a node? vertex point junction 0-simplex
  • 14.
  • 15.
  • 16. What’s an edge? arc branch line link 1-simplex
  • 18.
  • 19.
  • 22.
  • 23.
  • 24.
  • 25.
  • 28.
  • 29. (Graph does not include Justin Bieber)
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Find the band that is most often co-listened with the given one.
  • 42. People Find the band that is most often co-listened with the given one.
  • 43. People Bands Find the band that is most often co-listened with the given one.
  • 44. People Bands Find the band that is most often co-listened with the given one.
  • 45. People Bands Find the band that is most often co-listened with the given one.
  • 46. People Bands Find the band that is most often co-listened with the given one.
  • 47. Basically, most kinds of simple content/co-occurrence similarity.
  • 48. That’s a 2-step path on a bipartite graph. There are many of these ‘fundamental’ graph units: - tripartite - folksonomies (tripartite 3-graph + 2- step path) - multicolor-multiparity graph - etc.
  • 49. Graph Storage Engines
  • 50. Neo4j “An embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.” http://neo4j.org
  • 51. HypergraphDB “A general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects.” http://kobrix.org/hgdb.jsp
  • 53. FlockDB “FlockDB is a database that stores graph data, but it isn't a database optimized for graph-traversal operations. Instead, it's optimized for very large adjacency lists, fast reads and writes, and page-able set arithmetic queries.” http://engineering.t witter.com/2010/05/introducing- flockdb.html
  • 54. Redis “Redis is an advanced key-value store. [...] the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference bet ween sets, etc.” http://code.google.com/p/redis
  • 56. Redis makes you think in terms of datastructures, and operations on those structures.
  • 57. Set: Finite (for our cases) collection of objects in which order has no significance and multiplicity is generally ignored. S = { Alice, Bob, Carol } List: Finite (for our cases) collection of objects in which order *is* significant and multiplicity is allowed. L = [ X, Y, X, Z, Q]
  • 58. Insert a user into a set SET uid:1000:username jperras SET uid:1000:password bazinga!
  • 59. Use sets for denoting my followers/people I follow. uid:1000:followers => Set of uids of all the followers users uid:1000:following => Set of uids of all the following users
  • 60. Adding a new follower SADD uid:1000:following 1001 SADD uid:1001:followers 1000
  • 61. Posting Updates $r = Redis(); $postid = $r->incr("global:nextPostId"); $post = $User['id'] ."|". time() ."|". $status; $r->set("post:$postid", $post); $followers = $r->smembers("uid:".$User['id'].":followers"); if ($followers === false) $followers = Array(); $followers[] = $User['id']; /* Add the post to our own posts too */ foreach($followers as $fid) {     $r->push("uid:$fid:posts", $postid, false); } # Push the post on the timeline, and trim the timeline to the # newest 1000 elements. $r->push("global:timeline", $postid, false); $r->ltrim("global:timeline",0,1000);
  • 62. Common followers? - Set intersections! SINTER users:1000:followers users:1000:followers
  • 64.
  • 66.
  • 68. Relational databases can work for the simplest of cases, but fail horribly at nearly all graph-related operations/algorithms.
  • 69. Graphs and graph-databases are only going to be more and more useful.
  • 70. However, graph algorithms are hard. So don’t write your own. And make sure you use a persistent storage engine that is best suited for the type of queries you will be performing.
  • 72. Resources The Algorithm Design Manual, Steve S. Skiena Programming Collective Intelligence, Toby Segaran Introduction to Algorithms, Cormen, Leiserson, Rivest
  • 74. Photo Credits Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what- is-internet-lookslike/ (built from partial troll of public servers using traceroute) My real friends for letting me use their Facebook profile images.
  • 75. References Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of Mathematics at St. Petersburg http://mathworld.wolfram.com/Set.html Programming Collective Intelligence, Toby Segaran The Algorithm Design Manual, Steve S. Skiena

Editor's Notes

  1. Many of the most popular web applications today deal with highly organized and structured data that represent entities, and the relationships between these entities. LinkedIn can tell you how many degrees of separation there are between yourself and the CEO of Samsung, Facebook can figure out people that you might already know, Digg can recommend article submissions that you might like, and LastFM suggests music based on your current listening habits. We’ll take a look at the basic theory behind how some of these features can be implemented (no computer science degree required!), and take a quick look at the current landscape of graph-based datastores that simplify many of these operations.
  2. Start with some definitions.
  3. Collection of points - e.g. Users (Twitter/Facebook), songs (iTunes)
  4. Add relationships between data points
  5. Some relations are not symmetric - e.g. `friend` vs. following/follower is asymmetric.
  6. Your relationships might have a weight - e.g. # of Scrabulous games they have played together.
  7. Data points can also have weight - e.g. `reputation` score on social news sites like Digg, Reddit.
  8. Simple graph - at most one edge between vertex pair.
  9. Simple graph - at most one edge between vertex pair.
  10. Self-loops are allowed. e.g. if your application needs the ability for you to be your own ‘follower’ or ‘friend’.
  11. Notation that you might see - G is the ‘name’ of the graph, and is composed of ‘V’ vertices (nodes) and ‘E’ edges.
  12. "Vertex" is a synonym for a node of a graph, i.e., one of the points on which the graph is defined and which may be connected by graph edges.
  13. An ordered (or unordered) pair of nodes. Different types of edges: directed. In geometry, a simplex (plural simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimension. Specifically, an n-simplex is an n-dimensional polytope with n + 1 vertices, of which the simplex is the convex hull. For example, a 2-simplex is a triangle, a 3-simplex is a tetrahedron, and a 4-simplex is apentachoron. A single point may be considered a 0-simplex, and a line segment may be viewed as a 1-simplex. A simplex may be defined as the smallest convex set which contains the given vertices.
  14. The edge is an ordered pair of nodes. The terms "arc", "branch" "line", "link" and "1-simplex" are sometimes used instead of edge
  15. Edge highlight on next slide.
  16. an unordered pair of nodes that specify a line joining these two nodes are said to form an edge
  17. an unordered pair of nodes that specify a line joining these two nodes are said to form an edge
  18. Partial map of the internet, culled in 2003 using traceroute.
  19. Graph visualizations have also become quite important - displaying information on billions of points and edges in a useful manner is quite difficult. The graph is projected inside a 3D sphere using a special kind of space based hyperbolic geometry. This is a non-Euclidean space, which has useful distorting properties of making elements at the center of the display much larger than those on the periphery. Hyperbolic space projection is commonly know as “focus+context” in the field of information visualization and has been used to display all kinds of data that can be represented as large graphs in either two and three dimensions.
  20. This is a graph representation of the similarity relationships derived from the database of Last.fm. The circles (vertices) on the left hand side figure are bands, musicians, composers, whatever you will find in theMusic section of the site. Lines (edges) connect similar artists.  Vertex sizes vary according to the popularity of the artists. I Vertex colors correspond to musical genres, identified by tags attached to the artists by the users of Last.fm
  21. You are already a part of and use several social graphs.
  22. Twitter is one giant graph (users, followers, following) + timeline attached to users
  23. Linkedin is another giant graph. It’s basically in their name!
  24. Me
  25. I’m the center of the world.
  26. Relationships with my friends
  27. My friends also have relationships between themselves
  28. Let’s get rid of the pictures for a second
  29. My friends also have friends, and those friends can be friends with my other immediate friends.
  30. Important problems: max-intersection + strongest connection problem.
  31. From Twitter - solves their problems
  32. Sets are great when the order of your data doesn’t matter, and when you know that the objects need to be unique. Example: USERS Lists are best for things that need to be displayed in a given order, e.g. POST TIMELINE
  33. Sets are great when the order of your data doesn’t matter, and when you know that the objects need to be unique. Example: USERS Lists are best for things that need to be displayed in a given order, e.g. POST TIMELINE