SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Tagging and Folksonomy
  Schema Design for Scalability and Performance
                                     Jay Pipes
                     Community Relations Manager, North America
                                 (jay@mysql.com)
                                    MySQL, Inc.
   TELECONFERENCE: please dial a number to hear the audio portion of this presentation.

   Toll-free US/Canada: 866-469-3239        Direct US/Canada: 650-429-3300 :
   0800-295-240 Austria                     0800-71083 Belgium
   80-884912 Denmark                        0-800-1-12585 Finland
   0800-90-5571 France                      0800-101-6943 Germany
   00800-12-6759 Greece                     1-800-882019 Ireland
   800-780-632 Italy                        0-800-9214652 Israel
   800-2498 Luxembourg                      0800-022-6826 Netherlands
   800-15888 Norway                         900-97-1417 Spain
   020-79-7251 Sweden                       0800-561-201 Switzerland
   0800-028-8023 UK                         1800-093-897 Australia
   800-90-3575 Hong Kong                    00531-12-1688 Japan

                                   EVENT NUMBER/ACCESS CODE:

Copyright MySQL AB                                            The World’s Most Popular Open Source Database   1
Communicate ...

Connect ...

Share ...

Play ...

Trade ...                                            craigslist


Search & Look Up



Copyright MySQL AB   The World’s Most Popular Open Source Database   2
Software Industry Evolution
                                   WEB 1.0                     WEB 2.0




                                                                FREE & OPEN
                                                                SOURCE
                                                                SOFTWARE


                                    CLOSED SOURCE SOFTWARE



                           1990          2000
                                  1995             2005

Copyright MySQL AB                              The World’s Most Popular Open Source Database   3
How MySQL Enables Web 2.0

                                   ubiquitous with developers
             lower tco
                                                           large objects
                                  tag databases
                        lamp                      XML
  full-text search                                                language support
                        ease of use
     world-wide
                                                  performance
     community
                                                                                       scale out
                           interoperable
                                                  connectors
      open source                                                         bundled

                                reliability          session management
             replication
                                                                      query cache
   partitioning (5.1)
                      pluggable storage engines




Copyright MySQL AB                                      The World’s Most Popular Open Source Database   4
Agenda
     •   Web 2.0: more than just rounded corners
     •   Tagging Concepts in SQL
     •   Folksonomy Concepts in SQL
     •   Scaling out sensibly




Copyright MySQL AB                        The World’s Most Popular Open Source Database   5
More than just rounded corners
   • What is Web 2.0?
          – Participation
          – Interaction
          – Connection
   • A changing of design patterns
          –   AJAX and XML-RPC are changing the way data is queried
          –   Rich web-based clients
          –   Everyone's got an API
          –   API leads to increased requests per second. Must deal with
              growth!




Copyright MySQL AB                                  The World’s Most Popular Open Source Database   6
AJAX and XML-RPC change the game
   • Traditional (Web 1.0) page request builds entire page
          – Lots of HTML, style and data in each page request
          – Lots of data processed or queried in each request
          – Complex page requests and application logic
   • AJAX/XML-RPC request returns small data set, or
     page fragment
          – Smaller amount of data being passed on each request
          – But, often many more requests compared to Web 1.0!
          – Exposed APIs mean data services distribute your data to
            syndicated sites
          – RSS feeds supply data, not full web page




Copyright MySQL AB                                The World’s Most Popular Open Source Database   7
flickr Tag Cloud(s)




Copyright MySQL AB                  The World’s Most Popular Open Source Database   8
Participation, Interaction and Connection
   • Multiple users editing the same content
          – Content explosion
          – Lots of textual data to manage. How do we organize it?
   • User-driven data
          – “Tracked” pages/items
          – Allows interlinking of content via the user to user relationship
            (folksonomy)
   • Tags becoming the new categorization/filing of this
     content
          – Anyone can tag the data
          – Tags connect one thing to another, similar to the way that the
            folksonomy relationships link users together
   • So, the common concept here is “linking”...

Copyright MySQL AB                                    The World’s Most Popular Open Source Database   9
What The User Dimension Gives You
   • Folksonomy adds the “user dimension”
   • Tags often provide the “glue” between user and item
     dimensions
   • Answers questions such as:
          – Who shares my interests
          – What are the other interests of users who share my interests
          – Someone tagged my bookmark (or any other item) with
            something. What other items did that person tag with the
            same thing?
          – What is the user's immediate “ecosystem”? What are the
            fringes of that ecosystem?
                • Ecosystem radius expansion (like zipcode radius expansion...)
   • Marks a new aspect of web applications into the world
     of data warehousing and analysis
Copyright MySQL AB                                       The World’s Most Popular Open Source Database   10
Linking Is the “R” in RDBMS
   • The schema becomes the absolute driving force behind
     Web 2.0 applications
   • Understanding of many-to-many relationships is critical
   • Take advantage of MySQL's architecture so that our
     schema is as efficient as possible
   • Ensure your data store is normalized, standardized,
     and consistent
   • So, let's digg in! (pun intended)




Copyright MySQL AB                      The World’s Most Popular Open Source Database   11
Comparison of Tag Schema Designs
   • “MySQLicious”
          –   Entirely denormalized
          –   One main fact table with one field storing delimited list of tags
          –   Unless using FULLTEXT indexing, does not scale well at all
          –   Very inflexible
   • “Scuttle” solution
          – Two tables. Lookup one-way from item table to tag table
          – Somewhat inflexible
          – Doesn't represent a many-to-many relationship correctly
   • “Toxi” solution
          – Almost gets it right, except uses surrogate keys in mapping
            tables ...
          – Flexible, normalized approach
          – Closest to our recommended architecture ...
Copyright MySQL AB                                     The World’s Most Popular Open Source Database   12
Tagging Concepts in SQL




Copyright MySQL AB                    The World’s Most Popular Open Source Database   13
The Tags Table
   • The “Tags” table is the foundational table upon which
     all tag links are built
   • Lean and mean
   • Make primary key an INT!

    CREATE TABLE Tags (
    tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    , tag_text VARCHAR(50) NOT NULL
    , PRIMARY KEY pk_Tags (tag_id)
    , UNIQUE INDEX uix_TagText (tag_text)
    ) ENGINE=InnoDB;



Copyright MySQL AB                       The World’s Most Popular Open Source Database   14
Example Tag2Post Mapping Table
   • The mapping table creates the link between a tag and
     anything else
   • In other terms, it maps a many-to-many relationship
   • Important to index from both “sides”

    CREATE TABLE Tag2Post (
    tag_id INT UNSIGNED NOT NULL
    , post_id INT UNSIGNED NOT NULL
    , PRIMARY KEY pk_Tag2Post (tag_id, post_id)
    , INDEX (post_id)
    ) ENGINE=InnoDB;



Copyright MySQL AB                     The World’s Most Popular Open Source Database   15
The Tag Cloud
   • Tag density typically represented by larger fonts or
     different colors

      SELECT tag_text
      , COUNT(*) as num_tags
      FROM Tag2Post t2p
      INNER JOIN Tags t
      ON t2p.tag_id = t.tag_id
      GROUP BY tag_text;




Copyright MySQL AB                        The World’s Most Popular Open Source Database   16
Efficiency Issues with the Tag Cloud
   • With InnoDB tables, you don't want to be issuing
     COUNT() queries, even on an indexed field
       CREATE TABLE TagStat
       tag_id INT UNSIGNED NOT NULL
       , num_posts INT UNSIGNED NOT NULL
       // ... num_xxx INT UNSIGNED NOT NULL 
       , PRIMARY KEY (tag_id)
       ) ENGINE=InnoDB;

        SELECT tag_text, ts.num_posts
        FROM Tag2Post t2p
        INNER JOIN Tags t
        ON t2p.tag_id = t.tag_id
        INNER JOIN TagStat ts
        ON t.tag_id = ts.tag_id
        GROUP BY tag_text;

Copyright MySQL AB                      The World’s Most Popular Open Source Database   17
The Typical Related Items Query
   • Get all posts tagged with any tag attached to Post #6
   • In other words “Get me all posts related to post #6”


       SELECT p2.post_id 
       FROM Tag2Post p1
       INNER JOIN Tag2Post p2
       ON p1.tag_id = p2.tag_id
       WHERE p1.post_id = 6
       GROUP BY p2.post_id;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   18
Problems With Related Items Query
   • Joining small to medium sized “tag sets” works great
   • But... when you've got a large tag set on either “side” of
     the join, problems can occur with scalablity
   • One way to solve is via derived tables

       SELECT p2.post_id 
       FROM (
       SELECT tag_id FROM Tag2Post
       WHERE post_id = 6 LIMIT 10
       ) AS p1
       INNER JOIN Tag2Post p2
       ON p1.tag_id = p2.tag_id
       GROUP BY p2.post_id LIMIT 10;

Copyright MySQL AB                        The World’s Most Popular Open Source Database   19
The Typical Related Tags Query
   • Get all tags related to a particular tag via an item
   • The “reverse” of the related items query; we want a set
     of related tags, not related posts

       SELECT t2p2.tag_id, t2.tag_text  
       FROM (SELECT post_id FROM Tags t1
       INNER JOIN Tag2Post 
       ON t1.tag_id = Tag2Post.tag_id
       WHERE t1.tag_text = 'beach' LIMIT 10
       ) AS t2p1
       INNER JOIN Tag2Post t2p2
       ON t2p1.post_id = t2p2.post_id
       INNER JOIN Tags t2
       ON t2p2.tag_id = t2.tag_id
       GROUP BY t2p2.tag_id LIMIT 10;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   20
Dealing With More Than One Tag
   • What if we want only items related to each of a set of
     tags?
   • Here is the typical way of dealing with this problem

       SELECT t2p.post_id
       FROM Tags t1 
       INNER JOIN Tag2Post t2p
       ON t1.tag_id = t2p.tag_id
       WHERE t1.tag_text IN ('beach','cloud')
       GROUP BY t2p.post_id 
       HAVING COUNT(DISTINCT t2p.tag_id) = 2;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   21
Dealing With More Than One Tag (cont'd)
   • The GROUP BY and the HAVING COUNT(DISTINCT
     ...) can be eliminated through joins
   • Thus, you eliminate the Using temporary, using filesort
     in the query execution

        SELECT t2p2.post_id
        FROM Tags t1 CROSS JOIN Tags t2
        INNER JOIN Tag2Post t2p1
        ON t1.tag_id = t2p1.tag_id
        INNER JOIN Tag2Post t2p2
        ON t2p1.post_id = t2p2.post_id
        AND t2p2.tag_id = t2.tag_id
        WHERE t1.tag_text = 'beach'
        AND t2.tag_text = 'cloud';


Copyright MySQL AB                        The World’s Most Popular Open Source Database   22
Excluding Tags From A Resultset
   • Here, we want have a search query like “Give me all
     posts tagged with “beach” and “cloud” but not tagged
     with “flower”. Typical solution you will see:

        SELECT t2p1.post_id
        FROM Tag2Post t2p1
        INNER JOIN Tags t1 ON t2p1.tag_id = t1.tag_id
        WHERE t1.tag_text IN ('beach','cloud')
        AND t2p1.post_id NOT IN
        (SELECT post_id FROM Tag2Post t2p2
        INNER JOIN Tags t2 
        ON t2p2.tag_id = t2.tag_id
        WHERE t2.tag_text = 'flower')
        GROUP BY t2p1.post_id 
        HAVING COUNT(DISTINCT t2p1.tag_id) = 2;


Copyright MySQL AB                      The World’s Most Popular Open Source Database   23
Excluding Tags From A Resultset (cont'd)
   • More efficient to use an outer join to filter out the
     “minus” operator, plus get rid of the GROUP BY...
       SELECT t2p2.post_id
       FROM Tags t1 CROSS JOIN Tags t2 CROSS JOIN Tags t3
       INNER JOIN Tag2Post t2p1
       ON t1.tag_id = t2p1.tag_id
       INNER JOIN Tag2Post t2p2
       ON t2p1.post_id = t2p2.post_id
       AND t2p2.tag_id = t2.tag_id
       LEFT JOIN Tag2Post t2p3
       ON t2p2.post_id = t2p3.post_id
       AND t2p3.tag_id = t3.tag_id
       WHERE t1.tag_text = 'beach'
       AND t2.tag_text = 'cloud'
       AND t3.tag_text = 'flower'
       AND t2p3.post_id IS NULL;

Copyright MySQL AB                         The World’s Most Popular Open Source Database   24
Summary of SQL Tips for Tagging
   • Index fields in mapping tables properly. Ensure that
     each GROUP BY can access an index from the left
     side of the index
   • Use summary or statistic tables to eliminate the use of
     COUNT(*) expressions in tag clouding
   • Get rid of GROUP BY and HAVING COUNT(...) by
     using standard join techniques
   • Get rid of NOT IN expressions via a standard outer join
   • Use derived tables with an internal LIMIT expression to
     prevent wild relation queries from breaking scalability




Copyright MySQL AB                       The World’s Most Popular Open Source Database   25
Folksonomy Concepts in SQL




Copyright MySQL AB                     The World’s Most Popular Open Source Database   26
Here we see
                        tagging and
                        folksonomy
                     together: the user
                        dimension...




Copyright MySQL AB                        The World’s Most Popular Open Source Database   27
Folksonomy Adds The User Dimension
   • Adding the user dimension to our schema
   • The tag is the relationship glue between the user and
     item dimensions

       CREATE TABLE UserTagPost (
       user_id INT UNSIGNED NOT NULL
       , tag_id INT UNSIGNED NOT NULL
       , post_id INT UNSIGNED NOT NULL
       , PRIMARY KEY (user_id, tag_id, post_id)
       , INDEX (tag_id)
       , INDEX (post_id)
       ) ENGINE=InnoDB;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   28
Who Shares My Interest Directly?
   • Find out the users who have linked to the same item I
     have
   • Direct link; we don't go through the tag glue


    SELECT user_id 
    FROM UserTagPost
    WHERE post_id = @my_post_id;




Copyright MySQL AB                      The World’s Most Popular Open Source Database   29
Who Shares My Interests Indirectly?
   • Find out the users who have similar tag sets...
   • But, how much matching do we want to do? In other
     words, what radius do we want to match on...
   • The first step is to find my tags that are within the
     search radius... this yields my “top” or most popular
     tags

           SELECT tag_id 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius;


Copyright MySQL AB                       The World’s Most Popular Open Source Database   30
Who Shares My Interests (cont'd)
   • Now that we have our “top” tag set, we want to find
     users who match all of our top tags:


           SELECT others.user_id 
           FROM UserTagPost others 
           INNER JOIN (SELECT tag_id 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius) AS my_tags
           ON others.tag_id = my_tags.tag_id
           GROUP BY others.user_id;




Copyright MySQL AB                       The World’s Most Popular Open Source Database   31
Ranking Ecosystem Matches
   • What about finding our “closest” ecosystem matches
   • We can “rank” other users based on whether they have
     tagged items a number of times similar to ourselves
           SELECT others.user_id
           , (COUNT(*) ­ my_tags.num_tags) AS rank
           FROM UserTagPost others 
           INNER JOIN (
           SELECT tag_id, COUNT(*) AS num_tags 
           FROM UserTagPost
           WHERE user_id = @my_user_id
           GROUP BY tag_id
           HAVING COUNT(tag_id) >= @radius
           ) AS my_tags
           ON others.tag_id = my_tags.tag_id
           GROUP BY others.user_id
           ORDER BY rank DESC;

Copyright MySQL AB                       The World’s Most Popular Open Source Database   32
Ranking Ecosystem Matches Efficiently
   • But... we've still got our COUNT(*) problem
   • How about another summary table ...

                 CREATE TABLE UserTagStat (
                 user_id INT UNSIGNED NOT NULL
                 , tag_id INT UNSIGNED NOT NUL
                 , num_posts INT UNSIGNED NOT NULL
                 , PRIMARY KEY (user_id, tag_id)
                 , INDEX (tag_id)
                 ) ENGINE=InnoDB;




Copyright MySQL AB                          The World’s Most Popular Open Source Database   33
Ranking Ecosystem Matches Efficiently 2
   • Hey, we've eliminated the aggregation!


        SELECT others.user_id
        , (others.num_posts ­ my_tags.num_posts) AS rank
        FROM UserTagStat others 
        INNER JOIN (
        SELECT tag_id, num_posts
        FROM UserTagStat
        WHERE user_id = @my_user_id
        AND  num_posts >= @radius
        ) AS my_tags
        ON others.tag_id = my_tags.tag_id
        ORDER BY rank DESC;




Copyright MySQL AB                      The World’s Most Popular Open Source Database   34
Scaling Out Sensibly




Copyright MySQL AB                   The World’s Most Popular Open Source Database   35
MySQL Replication (Scale Out)
                                                             • Write to one master
                      Web/App       Web/App     Web/App
                                                             • Read from many slaves
                       Server        Server      Server
                                                             • Excellent for read
                                                               intensive apps
                            Load Balancer


                                        Reads             Reads
     Writes & Reads




                                                                         …
             Master                       Slave            Slave
             MySQL                        MySQL            MySQL
             Server                       Server           Server



                      Replication




Copyright MySQL AB                                         The World’s Most Popular Open Source Database   36
Scale Out Using Replication
     • Master DB stores all writes
     • Master has InnoDB tables
     • Slaves handle aggregate reads, non-realtime reads
     • Web servers can be load balanced (directed) to one or
       more slaves
     • Just plug in another slave to increase read performance
       (that's scaling out...)
     • Slave can provide hot standby as well as backup server




Copyright MySQL AB                       The World’s Most Popular Open Source Database   37
Scale Out Strategies
                     • Slave storage engine can differ from
                       Master!
                        – InnoDB on Master (great update/insert/delete
                          performance)
                        – MyISAM on Slave (fantastic read performance
                          and well as excellent concurrent insert
                          performance, plus can use FULLTEXT
                          indexing)
                     • Push aggregated summary data in
                       batches onto slaves for excellent read
                       performance of semi-static data
                        – Example: “this week's popular tags”
                           • Generate the data via cron job on each slave.
                             No need to burden the master server
                           • Truncate every week

Copyright MySQL AB                               The World’s Most Popular Open Source Database   38
Scale Out Strategies (cont'd)
                        • Offload FULLTEXT indexing onto a FT
                          indexer such as Apache Lucene,
                          Mnogosearch, Sphinx FT Engine, etc.
                        • Use Partitioning feature of 5.1 to
                          segment tag data across multiple
                          partitions, allowing you to spread disk
                          load sensibly based on your tag text
                          density
                        • Use the MySQL Query Cache effectively
                           – Use SQL_NO_CACHE when selecting from
                             frequently updated tables (ex: TagStat)
                           – Very effective for high-read environments:
                             can yield 200-250% performance
                             improvement


Copyright MySQL AB                                The World’s Most Popular Open Source Database   39
Questions?
 MySQL Forge
 http://forge.mysql.com/

 MySQL Forge Tag Schema Wiki pages
 http://forge.mysql.com/wiki/TagSchema

 PlanetMySQL
 http://www.planetmysql.org

                                   Jay Pipes
                               (jay@mysql.com)
                                  MySQL AB




Copyright MySQL AB                               The World’s Most Popular Open Source Database   40

Más contenido relacionado

La actualidad más candente

情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。
情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。
情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。Narichika Kajihara
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRecruit Technologies
 
5分で出来る!イケてるconfluenceページ
5分で出来る!イケてるconfluenceページ5分で出来る!イケてるconfluenceページ
5分で出来る!イケてるconfluenceページCLARA ONLINE, Inc.
 
BigQuery Query Optimization クエリ高速化編
BigQuery Query Optimization クエリ高速化編BigQuery Query Optimization クエリ高速化編
BigQuery Query Optimization クエリ高速化編sutepoi
 
フロー効率性とリソース効率性、再入門 #devlove #devkan
フロー効率性とリソース効率性、再入門 #devlove #devkanフロー効率性とリソース効率性、再入門 #devlove #devkan
フロー効率性とリソース効率性、再入門 #devlove #devkanItsuki Kuroda
 
AWSではじめるMLOps
AWSではじめるMLOpsAWSではじめるMLOps
AWSではじめるMLOpsMariOhbuchi
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学MITSUNARI Shigeo
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことマルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことAmazon Web Services Japan
 
アジャイルメトリクス実践ガイド
アジャイルメトリクス実践ガイドアジャイルメトリクス実践ガイド
アジャイルメトリクス実践ガイドHiroyuki Ito
 
AWSで作る分析基盤
AWSで作る分析基盤AWSで作る分析基盤
AWSで作る分析基盤Yu Otsubo
 
20200714 AWS Black Belt Online Seminar Amazon Neptune
20200714 AWS Black Belt Online Seminar Amazon Neptune20200714 AWS Black Belt Online Seminar Amazon Neptune
20200714 AWS Black Belt Online Seminar Amazon NeptuneAmazon Web Services Japan
 
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)Takuto Wada
 
JJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みJJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みRecruit Technologies
 
RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門Yuki Morishita
 
BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話itkr
 
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)Amazon Web Services Japan
 
ユニットテストの保守性を作りこむ, xpjugkansai2011
ユニットテストの保守性を作りこむ, xpjugkansai2011ユニットテストの保守性を作りこむ, xpjugkansai2011
ユニットテストの保守性を作りこむ, xpjugkansai2011H Iseri
 
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいことMLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいことRakuten Group, Inc.
 
フロー効率性とリソース効率性について #xpjug
フロー効率性とリソース効率性について #xpjugフロー効率性とリソース効率性について #xpjug
フロー効率性とリソース効率性について #xpjugItsuki Kuroda
 

La actualidad más candente (20)

情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。
情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。
情報共有は、なぜGoogle Docsじゃなく、 Confluenceなのか。
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
 
5分で出来る!イケてるconfluenceページ
5分で出来る!イケてるconfluenceページ5分で出来る!イケてるconfluenceページ
5分で出来る!イケてるconfluenceページ
 
BigQuery Query Optimization クエリ高速化編
BigQuery Query Optimization クエリ高速化編BigQuery Query Optimization クエリ高速化編
BigQuery Query Optimization クエリ高速化編
 
フロー効率性とリソース効率性、再入門 #devlove #devkan
フロー効率性とリソース効率性、再入門 #devlove #devkanフロー効率性とリソース効率性、再入門 #devlove #devkan
フロー効率性とリソース効率性、再入門 #devlove #devkan
 
show innodb status
show innodb statusshow innodb status
show innodb status
 
AWSではじめるMLOps
AWSではじめるMLOpsAWSではじめるMLOps
AWSではじめるMLOps
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのことマルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのこと
 
アジャイルメトリクス実践ガイド
アジャイルメトリクス実践ガイドアジャイルメトリクス実践ガイド
アジャイルメトリクス実践ガイド
 
AWSで作る分析基盤
AWSで作る分析基盤AWSで作る分析基盤
AWSで作る分析基盤
 
20200714 AWS Black Belt Online Seminar Amazon Neptune
20200714 AWS Black Belt Online Seminar Amazon Neptune20200714 AWS Black Belt Online Seminar Amazon Neptune
20200714 AWS Black Belt Online Seminar Amazon Neptune
 
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
SQLアンチパターン - 開発者を待ち受ける25の落とし穴 (拡大版)
 
JJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みJJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組み
 
RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門
 
BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話
 
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)
Amazon Elastic MapReduce with Hive/Presto ハンズオン(講義)
 
ユニットテストの保守性を作りこむ, xpjugkansai2011
ユニットテストの保守性を作りこむ, xpjugkansai2011ユニットテストの保守性を作りこむ, xpjugkansai2011
ユニットテストの保守性を作りこむ, xpjugkansai2011
 
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいことMLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
 
フロー効率性とリソース効率性について #xpjug
フロー効率性とリソース効率性について #xpjugフロー効率性とリソース効率性について #xpjug
フロー効率性とリソース効率性について #xpjug
 

Destacado

Tagging and Folksonomies
Tagging and FolksonomiesTagging and Folksonomies
Tagging and Folksonomieshhunt75
 
Tagging & Folksonomies poster
Tagging & Folksonomies posterTagging & Folksonomies poster
Tagging & Folksonomies posterPrattSILS
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCThomas Vander Wal
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentationamit_nathwani
 
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Freddy Limpens
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy PresentationVanessa Yau
 
Next Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingNext Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingThomas Vander Wal
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Introduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllIntroduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllMarc Grabanski
 
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesSocial Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesFreddy Limpens
 

Destacado (11)

Tagging and Folksonomies
Tagging and FolksonomiesTagging and Folksonomies
Tagging and Folksonomies
 
Tagging & Folksonomies poster
Tagging & Folksonomies posterTagging & Folksonomies poster
Tagging & Folksonomies poster
 
Understanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DCUnderstanding Tagging and Folksonomy - SharePoint Saturday DC
Understanding Tagging and Folksonomy - SharePoint Saturday DC
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentation
 
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
Bridging Social Web and Sem Web : 2 application cases in the field of sustain...
 
Folksonomy
FolksonomyFolksonomy
Folksonomy
 
Folksonomy Presentation
Folksonomy PresentationFolksonomy Presentation
Folksonomy Presentation
 
Next Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and TaggingNext Generation Search: Social Bookmarking and Tagging
Next Generation Search: Social Bookmarking and Tagging
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Introduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for AllIntroduction to jQuery Mobile - Web Deliver for All
Introduction to jQuery Mobile - Web Deliver for All
 
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologiesSocial Web and Semantic Web: towards synergy between folksonomies and ontologies
Social Web and Semantic Web: towards synergy between folksonomies and ontologies
 

Similar a Tagging and Folksonomy Schema Design for Scalability and Performance

Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
Oracle no sql database bigdata
Oracle no sql database   bigdataOracle no sql database   bigdata
Oracle no sql database bigdataJoão Gabriel Lima
 
MySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeMySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeArnab Ray
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020Alkin Tezuysal
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlsqlhjalp
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
MySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityMySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityFrederic Descamps
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02Vinamra Mittal
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementClusterpoint
 

Similar a Tagging and Folksonomy Schema Design for Scalability and Performance (20)

Mysql
MysqlMysql
Mysql
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
MySQL Cluster
MySQL ClusterMySQL Cluster
MySQL Cluster
 
Brandon
BrandonBrandon
Brandon
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Oracle no sql database bigdata
Oracle no sql database   bigdataOracle no sql database   bigdata
Oracle no sql database bigdata
 
MySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime TimeMySQL 8: Ready for Prime Time
MySQL 8: Ready for Prime Time
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Introduction to Mysql
Introduction to MysqlIntroduction to Mysql
Introduction to Mysql
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdl
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
MySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the CommunityMySQL Community Meetup in China : Innovation driven by the Community
MySQL Community Meetup in China : Innovation driven by the Community
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 

Último

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Tagging and Folksonomy Schema Design for Scalability and Performance

  • 1. Tagging and Folksonomy Schema Design for Scalability and Performance Jay Pipes Community Relations Manager, North America (jay@mysql.com) MySQL, Inc. TELECONFERENCE: please dial a number to hear the audio portion of this presentation. Toll-free US/Canada: 866-469-3239 Direct US/Canada: 650-429-3300 : 0800-295-240 Austria 0800-71083 Belgium 80-884912 Denmark 0-800-1-12585 Finland 0800-90-5571 France 0800-101-6943 Germany 00800-12-6759 Greece 1-800-882019 Ireland 800-780-632 Italy 0-800-9214652 Israel 800-2498 Luxembourg 0800-022-6826 Netherlands 800-15888 Norway 900-97-1417 Spain 020-79-7251 Sweden 0800-561-201 Switzerland 0800-028-8023 UK 1800-093-897 Australia 800-90-3575 Hong Kong 00531-12-1688 Japan EVENT NUMBER/ACCESS CODE: Copyright MySQL AB The World’s Most Popular Open Source Database 1
  • 2. Communicate ... Connect ... Share ... Play ... Trade ... craigslist Search & Look Up Copyright MySQL AB The World’s Most Popular Open Source Database 2
  • 3. Software Industry Evolution WEB 1.0 WEB 2.0 FREE & OPEN SOURCE SOFTWARE CLOSED SOURCE SOFTWARE 1990 2000 1995 2005 Copyright MySQL AB The World’s Most Popular Open Source Database 3
  • 4. How MySQL Enables Web 2.0 ubiquitous with developers lower tco large objects tag databases lamp XML full-text search language support ease of use world-wide performance community scale out interoperable connectors open source bundled reliability session management replication query cache partitioning (5.1) pluggable storage engines Copyright MySQL AB The World’s Most Popular Open Source Database 4
  • 5. Agenda • Web 2.0: more than just rounded corners • Tagging Concepts in SQL • Folksonomy Concepts in SQL • Scaling out sensibly Copyright MySQL AB The World’s Most Popular Open Source Database 5
  • 6. More than just rounded corners • What is Web 2.0? – Participation – Interaction – Connection • A changing of design patterns – AJAX and XML-RPC are changing the way data is queried – Rich web-based clients – Everyone's got an API – API leads to increased requests per second. Must deal with growth! Copyright MySQL AB The World’s Most Popular Open Source Database 6
  • 7. AJAX and XML-RPC change the game • Traditional (Web 1.0) page request builds entire page – Lots of HTML, style and data in each page request – Lots of data processed or queried in each request – Complex page requests and application logic • AJAX/XML-RPC request returns small data set, or page fragment – Smaller amount of data being passed on each request – But, often many more requests compared to Web 1.0! – Exposed APIs mean data services distribute your data to syndicated sites – RSS feeds supply data, not full web page Copyright MySQL AB The World’s Most Popular Open Source Database 7
  • 8. flickr Tag Cloud(s) Copyright MySQL AB The World’s Most Popular Open Source Database 8
  • 9. Participation, Interaction and Connection • Multiple users editing the same content – Content explosion – Lots of textual data to manage. How do we organize it? • User-driven data – “Tracked” pages/items – Allows interlinking of content via the user to user relationship (folksonomy) • Tags becoming the new categorization/filing of this content – Anyone can tag the data – Tags connect one thing to another, similar to the way that the folksonomy relationships link users together • So, the common concept here is “linking”... Copyright MySQL AB The World’s Most Popular Open Source Database 9
  • 10. What The User Dimension Gives You • Folksonomy adds the “user dimension” • Tags often provide the “glue” between user and item dimensions • Answers questions such as: – Who shares my interests – What are the other interests of users who share my interests – Someone tagged my bookmark (or any other item) with something. What other items did that person tag with the same thing? – What is the user's immediate “ecosystem”? What are the fringes of that ecosystem? • Ecosystem radius expansion (like zipcode radius expansion...) • Marks a new aspect of web applications into the world of data warehousing and analysis Copyright MySQL AB The World’s Most Popular Open Source Database 10
  • 11. Linking Is the “R” in RDBMS • The schema becomes the absolute driving force behind Web 2.0 applications • Understanding of many-to-many relationships is critical • Take advantage of MySQL's architecture so that our schema is as efficient as possible • Ensure your data store is normalized, standardized, and consistent • So, let's digg in! (pun intended) Copyright MySQL AB The World’s Most Popular Open Source Database 11
  • 12. Comparison of Tag Schema Designs • “MySQLicious” – Entirely denormalized – One main fact table with one field storing delimited list of tags – Unless using FULLTEXT indexing, does not scale well at all – Very inflexible • “Scuttle” solution – Two tables. Lookup one-way from item table to tag table – Somewhat inflexible – Doesn't represent a many-to-many relationship correctly • “Toxi” solution – Almost gets it right, except uses surrogate keys in mapping tables ... – Flexible, normalized approach – Closest to our recommended architecture ... Copyright MySQL AB The World’s Most Popular Open Source Database 12
  • 13. Tagging Concepts in SQL Copyright MySQL AB The World’s Most Popular Open Source Database 13
  • 14. The Tags Table • The “Tags” table is the foundational table upon which all tag links are built • Lean and mean • Make primary key an INT! CREATE TABLE Tags ( tag_id INT UNSIGNED NOT NULL AUTO_INCREMENT , tag_text VARCHAR(50) NOT NULL , PRIMARY KEY pk_Tags (tag_id) , UNIQUE INDEX uix_TagText (tag_text) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 14
  • 15. Example Tag2Post Mapping Table • The mapping table creates the link between a tag and anything else • In other terms, it maps a many-to-many relationship • Important to index from both “sides” CREATE TABLE Tag2Post ( tag_id INT UNSIGNED NOT NULL , post_id INT UNSIGNED NOT NULL , PRIMARY KEY pk_Tag2Post (tag_id, post_id) , INDEX (post_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 15
  • 16. The Tag Cloud • Tag density typically represented by larger fonts or different colors SELECT tag_text , COUNT(*) as num_tags FROM Tag2Post t2p INNER JOIN Tags t ON t2p.tag_id = t.tag_id GROUP BY tag_text; Copyright MySQL AB The World’s Most Popular Open Source Database 16
  • 17. Efficiency Issues with the Tag Cloud • With InnoDB tables, you don't want to be issuing COUNT() queries, even on an indexed field CREATE TABLE TagStat tag_id INT UNSIGNED NOT NULL , num_posts INT UNSIGNED NOT NULL // ... num_xxx INT UNSIGNED NOT NULL  , PRIMARY KEY (tag_id) ) ENGINE=InnoDB; SELECT tag_text, ts.num_posts FROM Tag2Post t2p INNER JOIN Tags t ON t2p.tag_id = t.tag_id INNER JOIN TagStat ts ON t.tag_id = ts.tag_id GROUP BY tag_text; Copyright MySQL AB The World’s Most Popular Open Source Database 17
  • 18. The Typical Related Items Query • Get all posts tagged with any tag attached to Post #6 • In other words “Get me all posts related to post #6” SELECT p2.post_id  FROM Tag2Post p1 INNER JOIN Tag2Post p2 ON p1.tag_id = p2.tag_id WHERE p1.post_id = 6 GROUP BY p2.post_id; Copyright MySQL AB The World’s Most Popular Open Source Database 18
  • 19. Problems With Related Items Query • Joining small to medium sized “tag sets” works great • But... when you've got a large tag set on either “side” of the join, problems can occur with scalablity • One way to solve is via derived tables SELECT p2.post_id  FROM ( SELECT tag_id FROM Tag2Post WHERE post_id = 6 LIMIT 10 ) AS p1 INNER JOIN Tag2Post p2 ON p1.tag_id = p2.tag_id GROUP BY p2.post_id LIMIT 10; Copyright MySQL AB The World’s Most Popular Open Source Database 19
  • 20. The Typical Related Tags Query • Get all tags related to a particular tag via an item • The “reverse” of the related items query; we want a set of related tags, not related posts SELECT t2p2.tag_id, t2.tag_text   FROM (SELECT post_id FROM Tags t1 INNER JOIN Tag2Post  ON t1.tag_id = Tag2Post.tag_id WHERE t1.tag_text = 'beach' LIMIT 10 ) AS t2p1 INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id INNER JOIN Tags t2 ON t2p2.tag_id = t2.tag_id GROUP BY t2p2.tag_id LIMIT 10; Copyright MySQL AB The World’s Most Popular Open Source Database 20
  • 21. Dealing With More Than One Tag • What if we want only items related to each of a set of tags? • Here is the typical way of dealing with this problem SELECT t2p.post_id FROM Tags t1  INNER JOIN Tag2Post t2p ON t1.tag_id = t2p.tag_id WHERE t1.tag_text IN ('beach','cloud') GROUP BY t2p.post_id  HAVING COUNT(DISTINCT t2p.tag_id) = 2; Copyright MySQL AB The World’s Most Popular Open Source Database 21
  • 22. Dealing With More Than One Tag (cont'd) • The GROUP BY and the HAVING COUNT(DISTINCT ...) can be eliminated through joins • Thus, you eliminate the Using temporary, using filesort in the query execution SELECT t2p2.post_id FROM Tags t1 CROSS JOIN Tags t2 INNER JOIN Tag2Post t2p1 ON t1.tag_id = t2p1.tag_id INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id AND t2p2.tag_id = t2.tag_id WHERE t1.tag_text = 'beach' AND t2.tag_text = 'cloud'; Copyright MySQL AB The World’s Most Popular Open Source Database 22
  • 23. Excluding Tags From A Resultset • Here, we want have a search query like “Give me all posts tagged with “beach” and “cloud” but not tagged with “flower”. Typical solution you will see: SELECT t2p1.post_id FROM Tag2Post t2p1 INNER JOIN Tags t1 ON t2p1.tag_id = t1.tag_id WHERE t1.tag_text IN ('beach','cloud') AND t2p1.post_id NOT IN (SELECT post_id FROM Tag2Post t2p2 INNER JOIN Tags t2  ON t2p2.tag_id = t2.tag_id WHERE t2.tag_text = 'flower') GROUP BY t2p1.post_id  HAVING COUNT(DISTINCT t2p1.tag_id) = 2; Copyright MySQL AB The World’s Most Popular Open Source Database 23
  • 24. Excluding Tags From A Resultset (cont'd) • More efficient to use an outer join to filter out the “minus” operator, plus get rid of the GROUP BY... SELECT t2p2.post_id FROM Tags t1 CROSS JOIN Tags t2 CROSS JOIN Tags t3 INNER JOIN Tag2Post t2p1 ON t1.tag_id = t2p1.tag_id INNER JOIN Tag2Post t2p2 ON t2p1.post_id = t2p2.post_id AND t2p2.tag_id = t2.tag_id LEFT JOIN Tag2Post t2p3 ON t2p2.post_id = t2p3.post_id AND t2p3.tag_id = t3.tag_id WHERE t1.tag_text = 'beach' AND t2.tag_text = 'cloud' AND t3.tag_text = 'flower' AND t2p3.post_id IS NULL; Copyright MySQL AB The World’s Most Popular Open Source Database 24
  • 25. Summary of SQL Tips for Tagging • Index fields in mapping tables properly. Ensure that each GROUP BY can access an index from the left side of the index • Use summary or statistic tables to eliminate the use of COUNT(*) expressions in tag clouding • Get rid of GROUP BY and HAVING COUNT(...) by using standard join techniques • Get rid of NOT IN expressions via a standard outer join • Use derived tables with an internal LIMIT expression to prevent wild relation queries from breaking scalability Copyright MySQL AB The World’s Most Popular Open Source Database 25
  • 26. Folksonomy Concepts in SQL Copyright MySQL AB The World’s Most Popular Open Source Database 26
  • 27. Here we see tagging and folksonomy together: the user dimension... Copyright MySQL AB The World’s Most Popular Open Source Database 27
  • 28. Folksonomy Adds The User Dimension • Adding the user dimension to our schema • The tag is the relationship glue between the user and item dimensions CREATE TABLE UserTagPost ( user_id INT UNSIGNED NOT NULL , tag_id INT UNSIGNED NOT NULL , post_id INT UNSIGNED NOT NULL , PRIMARY KEY (user_id, tag_id, post_id) , INDEX (tag_id) , INDEX (post_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 28
  • 29. Who Shares My Interest Directly? • Find out the users who have linked to the same item I have • Direct link; we don't go through the tag glue SELECT user_id  FROM UserTagPost WHERE post_id = @my_post_id; Copyright MySQL AB The World’s Most Popular Open Source Database 29
  • 30. Who Shares My Interests Indirectly? • Find out the users who have similar tag sets... • But, how much matching do we want to do? In other words, what radius do we want to match on... • The first step is to find my tags that are within the search radius... this yields my “top” or most popular tags SELECT tag_id  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius; Copyright MySQL AB The World’s Most Popular Open Source Database 30
  • 31. Who Shares My Interests (cont'd) • Now that we have our “top” tag set, we want to find users who match all of our top tags: SELECT others.user_id  FROM UserTagPost others  INNER JOIN (SELECT tag_id  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius) AS my_tags ON others.tag_id = my_tags.tag_id GROUP BY others.user_id; Copyright MySQL AB The World’s Most Popular Open Source Database 31
  • 32. Ranking Ecosystem Matches • What about finding our “closest” ecosystem matches • We can “rank” other users based on whether they have tagged items a number of times similar to ourselves SELECT others.user_id , (COUNT(*) ­ my_tags.num_tags) AS rank FROM UserTagPost others  INNER JOIN ( SELECT tag_id, COUNT(*) AS num_tags  FROM UserTagPost WHERE user_id = @my_user_id GROUP BY tag_id HAVING COUNT(tag_id) >= @radius ) AS my_tags ON others.tag_id = my_tags.tag_id GROUP BY others.user_id ORDER BY rank DESC; Copyright MySQL AB The World’s Most Popular Open Source Database 32
  • 33. Ranking Ecosystem Matches Efficiently • But... we've still got our COUNT(*) problem • How about another summary table ... CREATE TABLE UserTagStat ( user_id INT UNSIGNED NOT NULL , tag_id INT UNSIGNED NOT NUL , num_posts INT UNSIGNED NOT NULL , PRIMARY KEY (user_id, tag_id) , INDEX (tag_id) ) ENGINE=InnoDB; Copyright MySQL AB The World’s Most Popular Open Source Database 33
  • 34. Ranking Ecosystem Matches Efficiently 2 • Hey, we've eliminated the aggregation! SELECT others.user_id , (others.num_posts ­ my_tags.num_posts) AS rank FROM UserTagStat others  INNER JOIN ( SELECT tag_id, num_posts FROM UserTagStat WHERE user_id = @my_user_id AND  num_posts >= @radius ) AS my_tags ON others.tag_id = my_tags.tag_id ORDER BY rank DESC; Copyright MySQL AB The World’s Most Popular Open Source Database 34
  • 35. Scaling Out Sensibly Copyright MySQL AB The World’s Most Popular Open Source Database 35
  • 36. MySQL Replication (Scale Out) • Write to one master Web/App Web/App Web/App • Read from many slaves Server Server Server • Excellent for read intensive apps Load Balancer Reads Reads Writes & Reads … Master Slave Slave MySQL MySQL MySQL Server Server Server Replication Copyright MySQL AB The World’s Most Popular Open Source Database 36
  • 37. Scale Out Using Replication • Master DB stores all writes • Master has InnoDB tables • Slaves handle aggregate reads, non-realtime reads • Web servers can be load balanced (directed) to one or more slaves • Just plug in another slave to increase read performance (that's scaling out...) • Slave can provide hot standby as well as backup server Copyright MySQL AB The World’s Most Popular Open Source Database 37
  • 38. Scale Out Strategies • Slave storage engine can differ from Master! – InnoDB on Master (great update/insert/delete performance) – MyISAM on Slave (fantastic read performance and well as excellent concurrent insert performance, plus can use FULLTEXT indexing) • Push aggregated summary data in batches onto slaves for excellent read performance of semi-static data – Example: “this week's popular tags” • Generate the data via cron job on each slave. No need to burden the master server • Truncate every week Copyright MySQL AB The World’s Most Popular Open Source Database 38
  • 39. Scale Out Strategies (cont'd) • Offload FULLTEXT indexing onto a FT indexer such as Apache Lucene, Mnogosearch, Sphinx FT Engine, etc. • Use Partitioning feature of 5.1 to segment tag data across multiple partitions, allowing you to spread disk load sensibly based on your tag text density • Use the MySQL Query Cache effectively – Use SQL_NO_CACHE when selecting from frequently updated tables (ex: TagStat) – Very effective for high-read environments: can yield 200-250% performance improvement Copyright MySQL AB The World’s Most Popular Open Source Database 39
  • 40. Questions? MySQL Forge http://forge.mysql.com/ MySQL Forge Tag Schema Wiki pages http://forge.mysql.com/wiki/TagSchema PlanetMySQL http://www.planetmysql.org Jay Pipes (jay@mysql.com) MySQL AB Copyright MySQL AB The World’s Most Popular Open Source Database 40