Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Social Network Analysis at LinkedIn

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 73 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Social Network Analysis at LinkedIn (20)

Anuncio

Más reciente (20)

Social Network Analysis at LinkedIn

  1. 1. Social Network Analysis at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1
  2. 2. Who am I? 2
  3. 3. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 3
  4. 4. LinkedIn by the numbers 120,000,000+ users (August 4, 2011) 2+ new user registrations per second 81+ Million monthly unique users* (Comscore) 2 Billion People Searches in 2010 7.1 Billion page views in Q2 2010 2+ Million companies with LinkedIn Company Pages 2+ M LinkedIn Groups 4
  5. 5. Broad Range of Products 5
  6. 6. User Profile 6
  7. 7. Connections 7
  8. 8. Communications Communications 10 8
  9. 9. Hiring Solutions 9
  10. 10. Job Search 10
  11. 11. Company Pages 11
  12. 12. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 12
  13. 13. What do I mean by Analytics Products? 13
  14. 14. People You May Know 14
  15. 15. Profile Stats: WVMP 15
  16. 16. Viewers of this profile also ... 16
  17. 17. Skills 17
  18. 18. InMaps 18
  19. 19. Related Searches Millions of Searches everyday Help users to explore and refine their queries 19
  20. 20. Analytics Products: Key Ideas Recommendations People You May Know, Related Searches, Viewers of this profile ... Insight Profile Stats: Who Viewed My Profile, Skills Visualization InMaps 20
  21. 21. Analytics Products: Challenges LinkedIn: largest professional network 120+ million members on LinkedIn Billions of pageviews Terabytes of data to process 21
  22. 22. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 22
  23. 23. Connection Strength Let’s build “Connection Strength” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 23
  24. 24. Connection Strength How well do people know each other? Connection Strength Application: reorder updates to show updates from close connections 24
  25. 25. Connection Strength How well do Alice people know each other? Bob Carol 25
  26. 26. Connection Strength How well do Alice people know each other? Bob Carol 26
  27. 27. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing 27
  28. 28. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections 28
  29. 29. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 29
  30. 30. Pig Overview Load: load data, specify format Store: store data, specify format Foreach, Generate: Projections, similar to select Group by: group by column(s) Join, Filter, Limit, Order, ... User Defined Functions (UDFs) 30
  31. 31. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 31
  32. 32. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 32
  33. 33. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 33
  34. 34. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 34
  35. 35. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 35
  36. 36. Triangle Closing Example Alice Bob Carol connections = LOAD `connections` USING 1.(A,B),(B,A),(A,C),(C,A) PigStorage(); 2.(A,{B,C}),(B,{A}),(C,{A}) 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 36
  37. 37. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) group_conn = GROUP connections BY 2.(A,{B,C}),(B,{A}),(C,{A}) source_id; 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 37
  38. 38. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) pairs = FOREACH group_conn GENERATE 3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2); 4.(B,C,1), (C,B,1) 38
  39. 39. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn 3.(A,{B,C}),(A,{C,B}) GENERATE flatten(group) as (source_id, dest_id), 4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections; 39
  40. 40. Connection Strength Age Company Overlap School Overlap 40
  41. 41. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 41
  42. 42. Systems and Tools Kafka (LinkedIn) Hadoop (Apache) Azkaban (LinkedIn) Voldemort (LinkedIn) 42
  43. 43. Data Cycle 43
  44. 44. Systems and Tools Kafka publish-subscribe messaging system transfer data from production to HDFS Hadoop Azkaban Voldemort 44
  45. 45. Systems and Tools Kafka Hadoop Java MapReduce and Pig process data Azkaban Voldemort 45
  46. 46. Systems and Tools Kafka Hadoop Azkaban Hadoop workflow management tool to manage hundreds of Hadoop jobs Voldemort 46
  47. 47. Systems and Tools Kafka Hadoop Azkaban Voldemort Key-value store store output of Hadoop jobs and serve in production 47
  48. 48. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 48
  49. 49. Hadoop MapReduce Large-scale distributed data processing Map phase: partition the problem in smaller steps Reduce phase: aggregate the output of smaller steps Allows parallelism and fault-tolerance 49
  50. 50. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 50
  51. 51. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 51
  52. 52. Our Workflow Triangle Age School Overlap Closing Union push-to-qa push-to-prod 52
  53. 53. Sample Workflow 53
  54. 54. Azkaban Workflow Management Dependency management Regular Scheduling Monitoring Diverse jobs: Java, Pig, Clojure Configuration/Parameters Resource control/locking Restart/Stop/Retry Visualization History Logs 54
  55. 55. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 55
  56. 56. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 56
  57. 57. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 57
  58. 58. Voldemort Large amount of data/Scalable Quick lookup/low latency Versioning and Rollback Fault tolerance through replication Read only Offline index building 58
  59. 59. Voldemort RO Store 59
  60. 60. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 60
  61. 61. Related Searches Millions of Searches everyday Help users to explore and refine their queries 61
  62. 62. How to build Related Searches? Collaborative Filtering: searches done in the same session Q1 Q2 Q3 Q4 Time 62
  63. 63. How to build Related Searches? Searches correlated by results clicks Q1 R1 Qn Rm 63
  64. 64. How to build Related Searches? Searches with overlapping terms Q1 Jeff LinkedIn Q2 LinkedIn CEO 64
  65. 65. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 65
  66. 66. Sequencing the Startup DNA Done by Monica Rogati at LinkedIn 66
  67. 67. Sequencing the Startup DNA •Top majors: Entrepreneurship, Computer Engineering •Bottom: Nursing, Administration 67
  68. 68. Sequencing the Startup DNA •Most founders age at first startup between 20 and 40 68
  69. 69. Sequencing the Startup DNA •Top regions: San Francisco, NYC 69
  70. 70. Sequencing the Startup DNA •Top Business Schools: Stanford, Harvard, MIT Sloan 70
  71. 71. Things Covered Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 71
  72. 72. SNA Team Thanks to SNA Team at LinkedIn http://sna-projects.com We are hiring! 72
  73. 73. Questions? 73

×