SlideShare a Scribd company logo
1 of 42
Neo4j at Seth
Godin’s Squidoo
with
Chief Engineer Gil Hildebrand
What’s                           ?

Passionate people sharing the ideas they care about
Social publishing platform with over 3 million users
100mm+ pageviews per month, Quantcast ranked #35
in US
Introducing Postcards
A brand new product from Squidoo
Currently in private beta (not public just yet)
Single page, beautifully designed personal
recommendations of books, movies, music albums,
quotes, and other products and media types
Semantic Web

         A group of methods
         and technologies to
         allow machines to
         understand the
         meaning - or
         "semantics" - of
         information
Postcards get better with
the Semantic Web
 We parse web pages and external APIs to extract
 meaning.
 Web pages - Meta and Open Graph tags
   Title, Description, Photo, and Video
 External APIs
   Amazon, IMDB, Freebase, Google, YouTube, Bing,
   and more
Problem is normalization

 The meta tag “Hotel California” on a web page is not
 particularly useful unless I know the tag is music related
 - then I can search for music albums containing Hotel
 California.
 This is not easy, but the web as a whole is becoming
 more structured.
Connecting the Dots

Crawl a web page or API to extract metadata
Store subjects, nouns, adjectives, and possessives into
Neo
Query Neo to organize subjects into Stacks based on
nouns, adjectives, and possessives
Stacking Up
Postcards are organized into Stacks. Stacks are a
taxonomy based on media type and other common
factors. Ex:
  Books Stack
  Crime Novel Books Stack
  Tom Clancy Books Stack
Stacks created automatically based on metadata
associated with each Postcard.
Minimum of three Postcards is required for a Stack to
exist.
Modeling Taxonomy
Found that the “Parts of Speech” are a great way to
model Postcards taxonomy.
All Postcards have:
  Name of the item (subject)
  Domains or media types (nouns)
  Descriptors (adjectives)
  Owners or creators (possessives)
Parts of Speech
Modeling with our existing
DB platforms
Very familiar with MySQL.
Extremely reliable.
Relational model makes normalization possible, but
scaling is a concern as joins get larger and larger.
Schema                                    Queries
CREATE TABLE post_meta (
   post_id BIGINT,
   user_id VARCHAR,
   date_created SMALLINT,
   subject VARCHAR,                     Seth Godin’s Business Books
   noun VARCHAR,
   KEY (user_id),                       SELECT m.post_id FROM post_meta m
   KEY (date_created),                  JOIN possessives USING(user_id)
   KEY (subject),                       JOIN adjectives USING(user_id)
   KEY (noun)                           WHERE
);                                        possessive='Seth Godin'
                                          AND adjective='Business'
CREATE TABLE adjectives (                 AND noun='Book';
   post_id BIGINT,
   user_id VARCHAR,                     90s Rock Music Albums
   adjective VARCHAR,
   PRIMARY KEY (user_id, adjective),    SELECT m.post_id FROM post_meta m
   KEY (adjective)                      JOIN adjectives USING(user_id)
);                                      WHERE
                                          adjective='Rock'
CREATE TABLE possessives (                AND noun='Music';
   post_id BIGINT,                        AND date_created BETWEEN 1990 AND
   user_id VARCHAR,                     1999;
   possessive VARCHAR,
   PRIMARY KEY (user_id, possessive),
   KEY (possessive)
);
At Squidoo, used primarily for analytics.
Massively scalable, but no relational model or
aggregation features. Heavy denormalization required.
Many operations have to be performed asynchronously
using queues or batch processes.
Truly Relational
Our data model is very much a graph problem
Recommendation systems are one query away (easy!)
Meets all our tech requirements
Week One with Neo
Evaluating Tech Requirements

 High availability
 Great administrative tools
 Great PHP wrapper
   https://github.com/jadell/neo4jphp
 Commercial support
Learning to think in graphs was
HARD, but now feels NATURAL

              Should it be a node or a property?

              Which direction should the relationship
              point?

              More so than any other type of
              database I’ve encountered, graph
              DBs require you to know in advance
              exactly what queries you’ll need to
              perform.
Reviewing Sample Graphs
        (It Helps)

Official Examples: http://bit.ly/RzCDY9
5 Common Graphs: http://slidesha.re/cnomwz
Movies: http://bitly.com/QZbGw0
Designing with paper or flow chart
Learning PHP wrapper
First Prototype

           Basic HTML
           REST API only
             Easy to get started,
             but the real power
             comes from Cypher
Extending the
Prototype with Cypher

 Implement Cypher for recommendations and other
 traversals.
 Cypher looks intimidating at first, and the “it’s like SQL”
 analogy was not particularly helpful for me.
 However, Cypher is essential for using Neo’s most
 powerful features, and is worth learning. Once you get
 past the strange (but necessary) arrow syntax, it does
 start to feel like SQL.
3 Graph Design Tips
Tip #1: Use reference nodes




   START ref=node:Meta(title = "Actor")
   MATCH ref<-[:IS]-actor
   RETURN actor;
Tip #2: Use reference properties




    foreach ($posts as $post) {
      if ($post->getProperty(‘type’) == ‘Actor’) {
      // do something special for actors
      }
    }
Tip #3: Schema Changes
At first, there were a lot of schema changes during
development
No equivalent to MySQL’s ALTER TABLE or
TRUNCATE TABLE
Two options:
  Shut down Neo, rm -rf data/graph.db/*, and restart
  Or use this plugin: http://bitly.com/rHFSu6
    With the plugin, node IDs do not restart from zero
Tip #3.1: Schema Changes
      Wiped your DB and need to start over? Use an initialization script to set things up.


function initialize() {
    $master = $this->client->getNode(0);
    $master->setProperty('title', 'Master')->setProperty('parent', '')->save();

      // should be node 1
      $user_master = $this->client->makeNode();
      $user_master->save();
      $user_index = new EverymanNeo4jIndexNodeIndex($this->client, 'users');
      $user_index->save();

      $post_index = new EverymanNeo4jIndexNodeIndex($this->client, 'post');
      $post_index->save();

      $index = new EverymanNeo4jIndexNodeIndex($this->client, 'master');
      $nouns = array('Movie', 'Music', 'TV', 'Book', 'Video', 'Article', 'Photo', 'Product', 'Game', 'Squidoo');

      foreach ($nouns as $noun) {
        $node = $this->client->makeNode();
        $node->setProperty('title', $noun)->setProperty('type', 'master')->save();
        $index->add($node, 'noun', $noun);
        $index->save();
        $node->relateTo($master, 'IS')->save();

          $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun);
          $noun_index->save();
      }
  }
Postcards Demo
Homepage
A Single Postcard
Nouns


           “Noun” is our word for the
        domain or media type associated
                with a Postcard
Movie Noun
Just one example. We have books, music albums, products, and many others!
Single User’s Stack about Director
Martin Scorsese
Single User’s Stack about Director
Martin Scorsese




    START user=node({user_id})
    MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive
    WHERE possessive.title={meta} AND subject.type={noun}
    RETURN DISTINCT post, COLLECT(subject) as subject;

    {user_id} = 123
    {meta} = 'Martin Scorsese'
    {noun} = 'Movie'
Finding Stacks for a Postcard




   START post=node:post(post_id={post_id})
   MATCH post-[:POST]->subject-->adjective-[:IS]->parent
   RETURN subject, adjective, parent;
Finding a user’s “Liked” Postcards




     START user=node({user_id})
     MATCH user-[:LIKED]->post-[:POST]->subject
     RETURN DISTINCT post, COLLECT(subject) as subject;
Popularity Sorting

 Popularity is based on Likes, Comments, and other social
 signals, using a time decay factor to favor newer Postcards.
 Difficult to find an algorithm that allowed us support time
 decay without having to constantly re-score all Postcards.
 Long story short, we use Cypher’s ORDER BY for sorting. We
 perform a calculation based on pop_score and pop_date
 properties that exist in each Postcard node.
 An individual Postcard’s pop_score and pop_date are
 updated in real time when someone interacts with it.
Next Steps


Follow Users and Stacks (Activity Stream)
Load Balancing
Disambiguation
The End


          Gil Hildebrand
          gil@squidoo.com

More Related Content

Similar to When Relational Isn't Enough: Neo4j at Squidoo

JavaScript for Flex Devs
JavaScript for Flex DevsJavaScript for Flex Devs
JavaScript for Flex DevsAaronius
 
Schema design short
Schema design shortSchema design short
Schema design shortMongoDB
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012Yaqi Zhao
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPJeremy Kendall
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)Johannes Hoppe
 
2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL SpartakiadeJohannes Hoppe
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureDavid Hoerster
 
Scaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsScaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsMike Schinkel
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018artgillespie
 
CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 Karthik Padmanabhan
 
Creating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningCreating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningJonathan LeBlanc
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsWindows Developer
 
Drupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comDrupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comJD Leonard
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
CCCDjango2010.pdf
CCCDjango2010.pdfCCCDjango2010.pdf
CCCDjango2010.pdfjayarao21
 
Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Tarunsingh198
 
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerJoomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerSteven Pignataro
 

Similar to When Relational Isn't Enough: Neo4j at Squidoo (20)

JavaScript for Flex Devs
JavaScript for Flex DevsJavaScript for Flex Devs
JavaScript for Flex Devs
 
Schema design short
Schema design shortSchema design short
Schema design short
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
CMS content
CMS contentCMS content
CMS content
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
 
2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS Architecture
 
Scaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsScaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise Apps
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016
 
Creating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningCreating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data Mining
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive Cards
 
Drupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comDrupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.com
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
CCCDjango2010.pdf
CCCDjango2010.pdfCCCDjango2010.pdf
CCCDjango2010.pdf
 
Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)
 
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerJoomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

When Relational Isn't Enough: Neo4j at Squidoo

  • 1. Neo4j at Seth Godin’s Squidoo with Chief Engineer Gil Hildebrand
  • 2. What’s ? Passionate people sharing the ideas they care about Social publishing platform with over 3 million users 100mm+ pageviews per month, Quantcast ranked #35 in US
  • 3. Introducing Postcards A brand new product from Squidoo Currently in private beta (not public just yet) Single page, beautifully designed personal recommendations of books, movies, music albums, quotes, and other products and media types
  • 4.
  • 5. Semantic Web A group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information
  • 6.
  • 7. Postcards get better with the Semantic Web We parse web pages and external APIs to extract meaning. Web pages - Meta and Open Graph tags Title, Description, Photo, and Video External APIs Amazon, IMDB, Freebase, Google, YouTube, Bing, and more
  • 8. Problem is normalization The meta tag “Hotel California” on a web page is not particularly useful unless I know the tag is music related - then I can search for music albums containing Hotel California. This is not easy, but the web as a whole is becoming more structured.
  • 9. Connecting the Dots Crawl a web page or API to extract metadata Store subjects, nouns, adjectives, and possessives into Neo Query Neo to organize subjects into Stacks based on nouns, adjectives, and possessives
  • 10. Stacking Up Postcards are organized into Stacks. Stacks are a taxonomy based on media type and other common factors. Ex: Books Stack Crime Novel Books Stack Tom Clancy Books Stack Stacks created automatically based on metadata associated with each Postcard. Minimum of three Postcards is required for a Stack to exist.
  • 11. Modeling Taxonomy Found that the “Parts of Speech” are a great way to model Postcards taxonomy. All Postcards have: Name of the item (subject) Domains or media types (nouns) Descriptors (adjectives) Owners or creators (possessives)
  • 13. Modeling with our existing DB platforms
  • 14. Very familiar with MySQL. Extremely reliable. Relational model makes normalization possible, but scaling is a concern as joins get larger and larger.
  • 15. Schema Queries CREATE TABLE post_meta ( post_id BIGINT, user_id VARCHAR, date_created SMALLINT, subject VARCHAR, Seth Godin’s Business Books noun VARCHAR, KEY (user_id), SELECT m.post_id FROM post_meta m KEY (date_created), JOIN possessives USING(user_id) KEY (subject), JOIN adjectives USING(user_id) KEY (noun) WHERE ); possessive='Seth Godin' AND adjective='Business' CREATE TABLE adjectives ( AND noun='Book'; post_id BIGINT, user_id VARCHAR, 90s Rock Music Albums adjective VARCHAR, PRIMARY KEY (user_id, adjective), SELECT m.post_id FROM post_meta m KEY (adjective) JOIN adjectives USING(user_id) ); WHERE adjective='Rock' CREATE TABLE possessives ( AND noun='Music'; post_id BIGINT, AND date_created BETWEEN 1990 AND user_id VARCHAR, 1999; possessive VARCHAR, PRIMARY KEY (user_id, possessive), KEY (possessive) );
  • 16. At Squidoo, used primarily for analytics. Massively scalable, but no relational model or aggregation features. Heavy denormalization required. Many operations have to be performed asynchronously using queues or batch processes.
  • 17. Truly Relational Our data model is very much a graph problem Recommendation systems are one query away (easy!) Meets all our tech requirements
  • 19. Evaluating Tech Requirements High availability Great administrative tools Great PHP wrapper https://github.com/jadell/neo4jphp Commercial support
  • 20. Learning to think in graphs was HARD, but now feels NATURAL Should it be a node or a property? Which direction should the relationship point? More so than any other type of database I’ve encountered, graph DBs require you to know in advance exactly what queries you’ll need to perform.
  • 21. Reviewing Sample Graphs (It Helps) Official Examples: http://bit.ly/RzCDY9 5 Common Graphs: http://slidesha.re/cnomwz Movies: http://bitly.com/QZbGw0
  • 22. Designing with paper or flow chart
  • 24. First Prototype Basic HTML REST API only Easy to get started, but the real power comes from Cypher
  • 25. Extending the Prototype with Cypher Implement Cypher for recommendations and other traversals. Cypher looks intimidating at first, and the “it’s like SQL” analogy was not particularly helpful for me. However, Cypher is essential for using Neo’s most powerful features, and is worth learning. Once you get past the strange (but necessary) arrow syntax, it does start to feel like SQL.
  • 27. Tip #1: Use reference nodes START ref=node:Meta(title = "Actor") MATCH ref<-[:IS]-actor RETURN actor;
  • 28. Tip #2: Use reference properties foreach ($posts as $post) { if ($post->getProperty(‘type’) == ‘Actor’) { // do something special for actors } }
  • 29. Tip #3: Schema Changes At first, there were a lot of schema changes during development No equivalent to MySQL’s ALTER TABLE or TRUNCATE TABLE Two options: Shut down Neo, rm -rf data/graph.db/*, and restart Or use this plugin: http://bitly.com/rHFSu6 With the plugin, node IDs do not restart from zero
  • 30. Tip #3.1: Schema Changes Wiped your DB and need to start over? Use an initialization script to set things up. function initialize() { $master = $this->client->getNode(0); $master->setProperty('title', 'Master')->setProperty('parent', '')->save(); // should be node 1 $user_master = $this->client->makeNode(); $user_master->save(); $user_index = new EverymanNeo4jIndexNodeIndex($this->client, 'users'); $user_index->save(); $post_index = new EverymanNeo4jIndexNodeIndex($this->client, 'post'); $post_index->save(); $index = new EverymanNeo4jIndexNodeIndex($this->client, 'master'); $nouns = array('Movie', 'Music', 'TV', 'Book', 'Video', 'Article', 'Photo', 'Product', 'Game', 'Squidoo'); foreach ($nouns as $noun) { $node = $this->client->makeNode(); $node->setProperty('title', $noun)->setProperty('type', 'master')->save(); $index->add($node, 'noun', $noun); $index->save(); $node->relateTo($master, 'IS')->save(); $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun); $noun_index->save(); } }
  • 34. Nouns “Noun” is our word for the domain or media type associated with a Postcard
  • 35. Movie Noun Just one example. We have books, music albums, products, and many others!
  • 36. Single User’s Stack about Director Martin Scorsese
  • 37. Single User’s Stack about Director Martin Scorsese START user=node({user_id}) MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive WHERE possessive.title={meta} AND subject.type={noun} RETURN DISTINCT post, COLLECT(subject) as subject; {user_id} = 123 {meta} = 'Martin Scorsese' {noun} = 'Movie'
  • 38. Finding Stacks for a Postcard START post=node:post(post_id={post_id}) MATCH post-[:POST]->subject-->adjective-[:IS]->parent RETURN subject, adjective, parent;
  • 39. Finding a user’s “Liked” Postcards START user=node({user_id}) MATCH user-[:LIKED]->post-[:POST]->subject RETURN DISTINCT post, COLLECT(subject) as subject;
  • 40. Popularity Sorting Popularity is based on Likes, Comments, and other social signals, using a time decay factor to favor newer Postcards. Difficult to find an algorithm that allowed us support time decay without having to constantly re-score all Postcards. Long story short, we use Cypher’s ORDER BY for sorting. We perform a calculation based on pop_score and pop_date properties that exist in each Postcard node. An individual Postcard’s pop_score and pop_date are updated in real time when someone interacts with it.
  • 41. Next Steps Follow Users and Stacks (Activity Stream) Load Balancing Disambiguation
  • 42. The End Gil Hildebrand gil@squidoo.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n