The document discusses targeting communities to maximize information diffusion. It proposes measuring the impact of communities based on membership and centrality within communities. An impact focus method is introduced that targets communities based on total impact and entropy of influence. Evaluation on discussion forum data found impact focus outperformed random and in-degree based methods, especially for small numbers of targeted communities and users. The goal is to efficiently spread messages by stimulating influential communities.
2. Motivation
Digital Enterprise Research Institute www.deri.ie
• Many companies have started to utilise online communities as a means
of communicating and targeting their customers
• A common approach is to maximise information diffusion by targeting
influential actors
• In the context of many online communities (e.g. discussion fora) the
information is shared to the community as a whole and not to
individual actors
3. Objectives
Digital Enterprise Research Institute www.deri.ie
• Our main hypothesis is that it is possible to efficiently spread
a message over the information flow network by targeting
highly influential communities
• We derive the information flow network from the reply-to
network between the actors
• The main problem is then formulated as a prediction of the
set of communities to target such that the message is spread
over the network as much as possible
4. Methods: Definition of Impact
Digital Enterprise Research Institute www.deri.ie
forum A forum B
e
b
a d f
c
g
• We propose (Belák et al., ‘12) to take two factors into account:
1. degree of community membership of the users
2. centrality of the users within each community
• we used in-degree (# replies of a user)
• For general case of n users and k communities define:
• n × k membership matrix M
• n × k centrality matrix C
• Cross-community k × k impact matrix J can then be obtained as a product
of the two matrices: J=MTC
5. Methods: Targeting Communities
Digital Enterprise Research Institute www.deri.ie
• Level of dispersion (heterogeneity) of total impact of community i can be
measured as an entropy of a an i-th row/column of the impact matrix
• Is a community broadly influential or does it influence only few other
communities?
• We propose to target communities by means of the product of the total
impact of community i and its entropy: impact focus (IF)
• IF compared with random targeting (R), and group in-degree (GI) (Everett
& Borgatti, ‘99)
• We simulated the diffusion by extending Independent Cascade Model (ICM)
(Kempe et al., ‘03)
1. Take q target communities and sample s users from each of them
2. Run the original ICM from the union of sampled users
• Performance measured by the fraction of all the users, that have been
activated during the simulation
6. Evaluation Data-Set
Digital Enterprise Research Institute www.deri.ie
• 51 weeks of data of the largest Irish discussion board system
• Segmented using 1 week sliding window
• 1 week window represents approx. 84% of cross-fora posting
activity
• 540 communities in total
• 5,298 avg. nodes per snapshot
• 26,484 avg. edges per snapshot
7. Results: Avg. Performance
Digital Enterprise Research Institute www.deri.ie
• Impact focus outperformed the other two namely for small number of
targeted communities and seed users sampled from them
• Diffusion process became saturated on avg. at approx. 60% of the
users activated
targeted communities q=1 targeted communities q=2 targeted communities q=3 targeted communities q=4 targeted communities q=5
0.6
0.6
0.6
0.6
0.6
mean activation fraction (a)
mean activation fraction (a)
mean activation fraction (a)
mean activation fraction (a)
mean activation fraction (a)
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
IF IF IF IF IF
GI GI GI GI GI
R R R R R
0.1
0.1
0.1
0.1
0.1
2 4 6 8 10 14 2 4 6 8 10 14 2 4 6 8 10 14 2 4 6 8 10 14 2 4 6 8 10 14
user sample size (s) user sample size (s) user sample size (s) user sample size (s) user sample size (s)
8. Results: IF outperforms GI, R
Digital Enterprise Research Institute www.deri.ie
user sample size s=1 user sample size s=12
0.5
0.65
0.4
0.60
●
activation fraction (a)
activation fraction (a)
0.3
0.55
●
0.2
0.50
●
0.1
0.45
● ●
●
IF GI R IF GI R
9. Conclusion
Digital Enterprise Research Institute www.deri.ie
• The evaluation demonstrated that the framework
• is able to identify highly influential communities
• can predict which communities to stimulate (e.g. by posting a
message) s.t. the stimulus spreads efficiently
• We aim to extend it with content analysis
• E.g. What are the most influential communities with respect to a
particular topic?
• We will also investigate empirically-observed topic cascades and modify
our models accordingly if needed
References
• Belák V., Lam S., Hayes C. Cross-Community Influence in Discussion
Fora. ICWSM. AAAI, 2012.
• M. Everett and S. Borgatti. The centrality of groups and classes. J. of
Mathematical Sociology, 23(3):181–201, 1999.
• D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of
influence through a social network. SIGKDD. ACM, 2003.