Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Creating social features at BranchOut using MongoDB

720 visualizaciones

Publicado el

Creating social features at BranchOut using MongoDB

Publicado en: Software
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Creating social features at BranchOut using MongoDB

  1. 1. Building Social Features with MongoDB Nathan Smith BranchOut.com Jan. 22, 2013 Tuesday, January 22, 13
  2. 2. BranchOut • Connect with your colleagues (follow) • Activity feed of their professional activity • Timeline of an individual’s posts A more social professional network Tuesday, January 22, 13
  3. 3. BranchOut • 30M installed users • 750MM total user records • Average 300 connections per installed user A more social professional network Tuesday, January 22, 13
  4. 4. MongoDB @ BranchOut Tuesday, January 22, 13
  5. 5. MongoDB @ BranchOut • 100% MySQL until ~July 2012 Tuesday, January 22, 13
  6. 6. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model Tuesday, January 22, 13
  7. 7. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model • Our data design avoids RDBMS features Tuesday, January 22, 13
  8. 8. Follow System Tuesday, January 22, 13
  9. 9. Follow System Business logic Tuesday, January 22, 13
  10. 10. Follow System • Limit of 2000 followees (people you follow) Business logic Tuesday, January 22, 13
  11. 11. Follow System • Limit of 2000 followees (people you follow) • Unlimited followers Business logic Tuesday, January 22, 13
  12. 12. Follow System • Limit of 2000 followees (people you follow) • Unlimited followers • Both lists reflect updates in near-real time Business logic Tuesday, January 22, 13
  13. 13. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Tuesday, January 22, 13
  14. 14. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Tuesday, January 22, 13
  15. 15. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Disadvantage: Data locality, index size Tuesday, January 22, 13
  16. 16. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Tuesday, January 22, 13
  17. 17. Follow System MongoDB (first pass) Advantage: Compact data, read locality followee: { _id: 123 uids: [456, 567, 678] } Tuesday, January 22, 13
  18. 18. Follow System MongoDB (first pass) Advantage: Compact data, read locality Disadvantage: Can’t display a user’s followers followee: { _id: 123 uids: [456, 567, 678] } Tuesday, January 22, 13
  19. 19. db.follow.find({uids: 456}, {_id: 1}); Follow System Can’t display a user’s followers (easily) followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids Tuesday, January 22, 13
  20. 20. db.follow.find({uids: 456}, {_id: 1}); Follow System Can’t display a user’s followers (easily) Expensive! Also, no guarantee of order. followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids Tuesday, January 22, 13
  21. 21. Follow System MongoDB (second pass) followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } follower: { _id: 1, uids: [2] }, follower: { _id: 2, uids: [1] } follower: { _id: 3, uids: [1, 2] } Tuesday, January 22, 13
  22. 22. Follow System MongoDB (second pass) Advantages: Local data, fast selects followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } follower: { _id: 1, uids: [2] }, follower: { _id: 2, uids: [1] } follower: { _id: 3, uids: [1, 2] } Tuesday, January 22, 13
  23. 23. Follow System MongoDB (second pass) Advantages: Local data, fast selects Disadvantages: Follower doc size followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } follower: { _id: 1, uids: [2] }, follower: { _id: 2, uids: [1] } follower: { _id: 3, uids: [1, 2] } Tuesday, January 22, 13
  24. 24. Follow System Follower document size Tuesday, January 22, 13
  25. 25. Follow System Follower document size • Max Mongo doc size: 16MB Tuesday, January 22, 13
  26. 26. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM Tuesday, January 22, 13
  27. 27. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB Tuesday, January 22, 13
  28. 28. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB • Max followers per doc: ~2MM Tuesday, January 22, 13
  29. 29. Follow System MongoDB (final pass) follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2 }, follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000 } followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } Tuesday, January 22, 13
  30. 30. Follow System MongoDB (final pass) follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2 }, follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000 } followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } follower: { _id: “1”, uids: [2,3,4,...], count: 10001, next_page: 3 }, follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  31. 31. Follow System MongoDB (final pass) Asynchronous thread manages follower documents follower: { _id: “1”, uids: [2,3,4,...], count: 20001, next_page: 2 }, follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000 } followee: { _id: 1, uids: [2, 3] }, followee: { _id: 2, uids: [1, 3] } follower: { _id: “1”, uids: [2,3,4,...], count: 10001, next_page: 3 }, follower: { _id: “1_p2”, uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  32. 32. Activity Feed Tuesday, January 22, 13
  33. 33. Push vs Pull architecture Activity Feed Tuesday, January 22, 13
  34. 34. Push vs Pull architecture Activity Feed Tuesday, January 22, 13
  35. 35. Push vs Pull architecture Activity Feed Tuesday, January 22, 13
  36. 36. Business logic Activity Feed Tuesday, January 22, 13
  37. 37. Business logic • All connections and followees appear in your feed Activity Feed Tuesday, January 22, 13
  38. 38. Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) Activity Feed Tuesday, January 22, 13
  39. 39. Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types Activity Feed Tuesday, January 22, 13
  40. 40. Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object Activity Feed Tuesday, January 22, 13
  41. 41. Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object • Feed events are not ephemeral -- Timeline Activity Feed Tuesday, January 22, 13
  42. 42. Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Activity Feed Tuesday, January 22, 13
  43. 43. Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Activity Feed Tuesday, January 22, 13
  44. 44. Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Disadvantages: Rigid schema adapts poorly to new activity types, doesn’t scale Activity Feed Tuesday, January 22, 13
  45. 45. MongoDB ufc:{ _id: 123, // UID total_events: 18, 2013_01_total: 4, 2012_12_total: 8, 2012_11_total: 6, ...other counts... } ufm:{ _id: “123_2013_01”, events: [ { uid: 123, type: “photo_upload”, content_id: “abcd9876”, timestamp: 1358824502, ...more metadata... }, ...more events... ] } user_feed_card user_feed_month Activity Feed Tuesday, January 22, 13
  46. 46. Algorithm Activity Feed Tuesday, January 22, 13
  47. 47. Algorithm 1. Load user_feed_cards for all connections Activity Feed Tuesday, January 22, 13
  48. 48. Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load Activity Feed Tuesday, January 22, 13
  49. 49. Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months Activity Feed Tuesday, January 22, 13
  50. 50. Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story Activity Feed Tuesday, January 22, 13
  51. 51. Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) Activity Feed Tuesday, January 22, 13
  52. 52. Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) 6. Load content, comments, etc. and build stories Activity Feed Tuesday, January 22, 13
  53. 53. Performance Activity Feed Tuesday, January 22, 13
  54. 54. Performance • Response times average under 500 ms (98th percentile under 1 sec Activity Feed Tuesday, January 22, 13
  55. 55. Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally Activity Feed Tuesday, January 22, 13
  56. 56. Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally • Need to continue to optimize Activity Feed Tuesday, January 22, 13
  57. 57. Building Social Features with MongoDB Nathan Smith BrO: http://branchout.com/nate FB: http://facebook.com/neocortica Twitter: @nate510 Email: nate@branchout.com Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture Good Quora questions on activity feeds: http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed Tuesday, January 22, 13

×