Organizational Overlap on Social Networks and its Applications
1. Organizational Overlap on
Social Networks and its
Applications
Mitul Tiwari
Joint work with Cho-Jui Hsieh, Deepak Agarwal,
Xinyi (Lisa) Huang, and Sam Shah
LinkedIn
3. 3
Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection
4. 4
Motivation
• Social Networks : important for
• Sharing and Discovery
• Communication
• Networking
• Online Social Networks are partially observed
• Link Prediction and Recommending entities are important
10. 11
Motivation
• Member profile contains various types of organizations
• Company, Schools, Groups, ...
• Can we compute edge affinity based on these organization
information?
• Useful for many applications:
• Recommending members to connect (link prediction)
• Recommending other entities from the same community (community
detection)
11. 12
Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection
12. 13
Organizational Overlap Problem
• Goal: compute the probability of connection based on the
organizational time overlap
• Organizational time overlap between two members A and B,
who belonged to the same organization O : T(A, B, O)
• Probability that A and B are connected: P(A, B)
• P(A, B) = f(T(A, B, O), O), over all organizations O
• A function of time overlapped in the organization O
• Properties of the organization O
18. 19
Organizational Overlap Model:
Estimating λ
• λ: organization dependent
parameter
• Members of smaller
organization is more likely to
know each other
• Empirical and MLE estimates
for log(λ) ~ -0.8 log(|S|)
19. 20
Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection
20. 21
Application: Link Prediction
• Warm start: existing edges
• 2 features: org. overlap
time and size
• Common Neighbors (CN)
• Adamic-Adar (AA)
• Data Sets: LinkedIn, Enron
emails, Wiki talk
21. 22
Application: Link Prediction
• Cold start: no or sparse
edges
• All features:
• time overlap, company size,
company propensity, node
propensity, ...
22. 23
Application: Community Detection
• Good for candidate generation for an entity recommendation
systems, such as, companies to follow
• Graph Clustering algorithm (Graclus)
• Members as nodes and an edge between any pair of nodes with overlap
• Organizational overlap model for computing edge weight
• Graclus: minimizes the total weight of the cuts
• Evaluation using
• Virality of company follow within communities
• Virality of article updates
23. 24
Community Detection Evaluation
• Using Spread of company follow
• Compared 3 methods
• Organizational overlap based
• Using social connections graph
• Random: partition the nodes in the
same company
• Spread: avg # of companies
followed within d days of the
first follow event
• Propagation rate: norm. spread
Hi, I am Mitul Tiwari. Today I am going to present our paper on “Organizational Overlap on Social Networks and its Applications”.
This is joint work with Cho-Jui, Deepak, Lisa, and Sam
Here is the outline of the rest of my talk.
LinkedIn is the second largest social network for professionals with more than 225 million members.
Members can create profiles with their education and employment details
Members can connect with each other and maintain their professional network on linkedin.
TODO: replace screenshot
PYMK is a large scale recommendation system that helps you connect with others.
Basically, PYMK is a link prediction problem, where we analyze billions of edges to recommend possible connections to you.
A big big-data problem!
Companies can create pages and members can follow companies.
TODO: replace screenshot
LinkedIn’s homepage is powered by recommendation engine: News, Connections, Jobs, Groups, Companies
Also, ADs, Releavant Updates
A rich recommender systems ecosystem at linkedin: from connections, news, skills, Jobs, companies, groups, search queries, talent, similar profiles, ...
Here is the outline of the rest of my talk.
For a company A, this graph shows connection density, that is, the ratio of the # of connection with certain time overlap t within Company A and the total number of pairs with time overlap t within Company A
We observe that connection density increases with time overlap t
We see similar behavior with many companies, groups, and schools
We came to this insight that connection density increases with organizational time overlap
we sampled companies of different sizes
we calculated connection density with respect to company size
we observed that connection density decreases as the size of the organization increases
it makes sense since in a smaller organization people know each other
1. Community-Affiliation Graph Model (AGM) proposes P(O1, O2) = 1 - (1-P(O1))(1-P(O2))
Using that we can come to assumption 1
2. P(t) is probability, so we can safely assume that it is between 0 and 1. And P(t) is 0 iff t=0, that is, there is no overlap
1. Assumption 1 can be used to further decompose a time interval t into m smaller intervals to get Lemma1
2. P(δt) = 0 from Assumption 2. Using Assumption 1: P(t-delta t) = p(t) = p(t+delta t)
3. From Lemma 1 and Lemma 2 we can derive: 1-P(t) = Q(1)^t
Empirical connection density value fits our model well.
In large companies it is not possible to have P(t) to be 1 for large t.
We observe an upper bound mu for the probability
MLE: maximize log likelihood that is : Sum ( X_i log(P(t_i) + (1-X_i)log(1-P(t_i)) )
Here is the outline of the rest of my talk.
warm start setting where we have existing edges
Enron emails:
Wiki talk: conversation, discussion between editors. Edits on the same page implies conversation