Social Network Analysis at LinkedIn

Social Network Analysis at
Linkedin
Mitul Tiwari
Search, Network, and Analytics (SNA)
LinkedIn
1

Outline

About Me

About LinkedIn

Analytics products at LinkedIn

Fun Facts: Sequencing the Startup DNA

3

LinkedIn by the numbers

120,000,000+ users (August 4, 2011)

2+ new user registrations per second

81+ Million monthly unique users* (Comscore)

2 Billion People Searches in 2010

7.1 Billion page views in Q2 2010

2+ Million companies with LinkedIn Company Pages

2+ M LinkedIn Groups

4

Broad Range of Products

5

Communications

Communications

10

8

Outline

About Me

About LinkedIn



12

What do I mean by Analytics Products?

13

People You May Know

14

Proﬁle Stats: WVMP

15

Viewers of this proﬁle also ...

16

Related Searches

Millions of Searches everyday

Help users to explore and reﬁne their queries

19

Analytics Products: Key Ideas

Recommendations
People You May Know, Related Searches, Viewers of
this profile ...

Insight
Profile Stats: Who Viewed My Profile, Skills

Visualization
InMaps
20

Analytics Products: Challenges

LinkedIn: largest professional network

120+ million members on LinkedIn

Billions of pageviews

Terabytes of data to process

21

Outline


Deep Dive - Connection Strength

Deep Dive - Related Searches


22

Connection Strength

Let’s build “Connection Strength”

Systems and Tools we use

Hadoop MapReduce

Managing workﬂow

Serving data in production

23

Connection Strength

How well do people know each other?

Connection Strength

Application: reorder updates to show updates
from close connections

24

Connection Strength
How well do
Alice
people know each
other?

Bob Carol

25

Connection Strength
How well do
Alice
people know each
other?

Bob Carol

26

Connection Strength
How well do
Alice
people know each
other?

Bob Carol

Triangle closing

27

Connection Strength
How well do
Alice
people know each
other?

Bob Carol

Triangle closing
Prob(Bob knows Carol) ~ the # of common connections

28

Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
ﬂatten(group) as (source_id, dest_id),
COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();

29

Pig Overview
Load: load data, specify format

Store: store data, specify format

Foreach, Generate: Projections, similar to select

Group by: group by column(s)

Join, Filter, Limit, Order, ...

User Deﬁned Functions (UDFs)
30



31



32



33



34



35

Triangle Closing Example
Alice

Bob Carol

connections = LOAD `connections` USING
1.(A,B),(B,A),(A,C),(C,A) PigStorage();
2.(A,{B,C}),(B,{A}),(C,{A})
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
36

Alice

Bob Carol

1.(A,B),(B,A),(A,C),(C,A)
group_conn = GROUP connections BY
2.(A,{B,C}),(B,{A}),(C,{A}) source_id;
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
37

Alice

Bob Carol

1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A})
3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2);
4.(B,C,1), (C,B,1)
38

Alice

Bob Carol

1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn
3.(A,{B,C}),(A,{C,B}) GENERATE ﬂatten(group) as (source_id, dest_id),
4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections;
39

Connection Strength

Age

Company Overlap

School Overlap

40

Related Searches

Let’s build “Related Searches”


Hadoop MapReduce

Managing workﬂow


41

Systems and Tools

Kafka (LinkedIn)

Hadoop (Apache)

Azkaban (LinkedIn)

Voldemort (LinkedIn)

42

Systems and Tools

Kafka
publish-subscribe messaging system

transfer data from production to HDFS

Hadoop

Azkaban

Voldemort

44

Systems and Tools

Kafka

Hadoop
Java MapReduce and Pig

process data

Azkaban

Voldemort

45

Systems and Tools

Kafka

Hadoop

Azkaban
Hadoop workﬂow management tool

to manage hundreds of Hadoop jobs

Voldemort

46

Systems and Tools

Kafka

Hadoop

Azkaban

Voldemort
Key-value store

store output of Hadoop jobs and serve in production

47

Related Searches



Hadoop MapReduce

Managing workﬂow


48

Hadoop MapReduce

Large-scale distributed data processing

Map phase: partition the problem in smaller
steps

Reduce phase: aggregate the output of smaller
steps

Allows parallelism and fault-tolerance

49

Related Searches



Hadoop MapReduce

Managing workﬂow


50

Our Workﬂow

Triangle
Age School Overlap
Closing

Union

push-to-prod

51

Our Workﬂow

Triangle
Age School Overlap
Closing

Union

push-to-qa push-to-prod

52

Azkaban Workﬂow Management

Dependency management
Regular Scheduling
Monitoring
Diverse jobs: Java, Pig, Clojure
Conﬁguration/Parameters
Resource control/locking
Restart/Stop/Retry
Visualization
History
Logs
54

Our Workﬂow

Triangle
Age School Overlap
Closing

Union

push-to-prod

55

Our Workﬂow

Triangle
Age School Overlap
Closing

Union

push-to-prod

56

Related Searches



Hadoop MapReduce

Managing workﬂow


57

Voldemort

Large amount of data/Scalable

Quick lookup/low latency

Versioning and Rollback

Fault tolerance through replication

Read only

Ofﬂine index building

58

Voldemort RO Store

59

Outline





60

Related Searches

Millions of Searches everyday

Help users to explore and reﬁne their queries

61

How to build Related Searches?

Collaborative Filtering: searches done in the
same session

Q1 Q2 Q3 Q4

Time

62


Searches correlated by results clicks

Q1 R1

Qn Rm

63


Searches with overlapping terms

Q1 Jeff LinkedIn

Q2 LinkedIn CEO

64

Outline





65

Sequencing the Startup DNA

Done by Monica Rogati at
LinkedIn

66


•Top majors: Entrepreneurship, Computer Engineering
•Bottom: Nursing, Administration

67


•Most founders age at ﬁrst startup between 20 and 40

68


•Top regions: San Francisco, NYC

69


•Top Business Schools:
Stanford, Harvard, MIT Sloan

70

Things Covered





71

SNA Team

Thanks to SNA Team at LinkedIn

http://sna-projects.com

We are hiring!

72

Social Network Analysis at LinkedIn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Social Network Analysis at LinkedIn

Similar to Social Network Analysis at LinkedIn (20)

More from Mitul Tiwari

More from Mitul Tiwari (9)

Recently uploaded

Recently uploaded (20)

Social Network Analysis at LinkedIn