SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
FROM USER NEEDS TO
COMMUNITY HEALTH:
MINING USER BEHAVIOUR TO
ANALYSE ONLINE
COMMUNITIES
DR. MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
Invited Talk @ 1st Workshop on Quality, Motivation and Coordination,
International Conference on Social Informatics 2013. Kyoto, Japan
About Me
1

2002-2006: M.Eng Software Engineering
2006-2010: Ph.D. Computer Science
2010-2012: Postdoc Research Associate
2012-now: Lecturer in Social Computing

Undergrad

Postgrad

Postdoc

Lecturing

Time
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Research Interests
2

Semantics
Social networks
Digital Identity

Data

Forecasting + Classification
Data Mining
Disambiguation

Automating Processes
Modelling Social Systems
Artificial Intelligence

Machines

Prediction

http://scholar.google.com/citations?user=rhyR4_kAAAAJ
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Collaborators
3

Harith Alani. Senior Lecturer, Knowledge Media Institute,
The Open University, UK.
http://people.kmi.open.ac.uk/harith/
Miriam Fernandez. Research Associate, Knowledge
Media Institute, The Open University, UK.
http://kmi.open.ac.uk/people/member/miriamfernandez
Conor Hayes. Senior Research Fellow, Digital Enterprise
Research Institute, Galway, Ireland.
http://www.deri.ie/users/conor-hayes
Marcel Karnstedt. Senior Postdoctoral Researcher, Digital
Enterprise Research Institute, Galway, Ireland.
http://www.marcel.karnstedt.com/

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Outline
4

¨ 

Part I: Online Communities and User Behaviour
¤ 

define: online communities, user behaviour!

¤  The
¨ 

potential for examining user behaviour

Part II: Comparing User Behaviour and User Needs
¤  Collecting

users’ needs in online communities
¤  Linking needs to behaviour
¨ 

Part III: Predicting Community Health from User
Behaviour
¤  Mining

roles from user behaviour
¤  Community health forecasting from collective behaviour
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
5

Part I: Online Communities and User
Behaviour

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Defining Online Communities
6

a) 

Distinct user containers in which users discuss a
given topic
¤  E.g.

message board forums
¤  E.g. question-answering systems
b) 

Latent grouping of users by some common
attribute
¤  E.g.

semantic web community
¤  E.g. social network clusters with high social homophily
¨ 

This talk focuses on: a) User containers

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
BT (British telecommunications firm) use online
communities to enable consumers to provide
support to other consumers
BBC News web site provides comments sections to
encourage user engagement with the news

Question-answering systems allow communities of
‘knowledgeable’ users to ask questions and
provide answers

7
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Why Provide Online Communities?
8

Increase
Customer
Loyalty

Understanding
Product Issues

Facilitating
Idea
Generation

Raising Brand
Awareness

Spreading
through Word
of Mouth

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Managing Online Communities
9

¨ 

Online communities incur significant investments:
¤  Hosting
n  Cost

and bandwidth:

(time + money) grows linearly with popularity

¤  Community

management:

n  Settling

disputes
n  Encouraging engagement within the communities
¨ 

Common questions arise:
¤  ‘How

do I know if my community is healthy?’
¤  ‘What changes in the community lead to it becoming
unhealthy’?

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
How do I know if my community is
‘healthy’?!
10

¨ 

Approach 1: Needs Satisfaction
¤ 
¤ 

¨ 

Identify users’ needs for the community
Analyse users to see if their needs have been met

Approach 2: Numerical Health Measures
¤  Determine

suitable measures for community health (e.g.

churn rate)
¤  Analyse these measures over time to see if the
community is remaining healthy, or not

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Analysing User Behaviour
11

¨ 

Online communities are behavioural ecosystems
¤  Prevalent

user behaviour can impact the behaviour of
other users (Preece. 2000)
‘the way’

‘tangible measures derived
from actions performed by and
upon a user’

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Behaviour Features

User
Post
Forum

‘tangible measures of actions
performed by and upon a user’

Initiation

¨ 
¤ 

The extent to which users begin discussions in a community

Contribution

¨ 
¤ 

The extent to which the user is providing content

Popularity

¨ 
¤ 

Proportion of the community that responds to the user

Engagement

¨ 
¤ 

Proportion of the community that a user responds to

Focus Dispersion

¨ 
¤ 

Variance of the user’s interests across topics

Quality

¨ 
¤ 

Reception of the user’s content by other users

12
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

+1

+1
13

Part II: Comparing User Behaviour and
User Needs

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Maslow’s Hierarchy of Needs
14

How does this hierarchy resonate with online community users?

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
User Needs in Online Communities
15

¨ 

Users have different needs for participating in an
online community:
¤  To

create content and share information
¤  To communicate with other users
¤  To ask questions
¤  To collaborate with other users
¤  To help other users resolve problems and issues
¤  To discuss ideas
¨ 

We wanted to find out how important the above
needs were to community users…

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Dataset 1:
¨ 

Enterprise social software suite
¤  Communities

¨ 

within enterprises

Anonymised dataset (Jan 2010 -> April 2011)
¤  #Communities

of Practice (CoP): 100
¤  #Team Communities (Team): 72
¤  #Technical Support (Tech): 14
¨ 

Labels provided by (Muller et al. 2012)

16
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
17

Understanding Users’ Needs on IBM
Connections
¨ 

Surveyed 186 users about their needs
¤  Spanning

the aforementioned typed communities
¤  150 responses

Likert scale (1-5) for agreement with statements
¨  Examples included:
¨ 

¤  How

often do you do the following?

n  Browse

¤  Rate

for information, Search for information, etc.

how important the community features are to you?

n  Receiving

recommendations, ability to filter information, etc.

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Users Needs on IBM Connections
18

Ranked Community Features:

D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani,
S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Users Needs on IBM Connections
19

D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani,
S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
differtypes
5. We
nd the
vel of
n. As
mmuarticbution
sis to
w that
lving

pes in
n and
e. For
alue t type

in the other communities. Popularity is higher in Team and
Tech communities, but not significantly, than in CoP, suggesting that although users of the latter community provide
more contributions, it is with content published by fewer
users. For Engagement the mean is significantly highest - at
< 0.001 - for Team indicating that users tend to participate
¨  Measured the behaviour of users across the three
with more users in these communities than the others.

User Behaviour on IBM Connections
20

IBM Connections community types

Table 2. Mean and Standard Deviation (in parentheses) of the distribuStandard deviation
Mean of of micro features within the different community types
tion the behaviour feature
Feature
CoP
Team
Tech
Focus Dis’
1.682 (1.680)
1.391 (1.581)
1.382 (1.534)
Initiation
7.788 (21.525)
13.235 (23.361)
3.088 (6.676)
Contribution 26.084 (77.607) 21.130 (72.298) 11.753 (17.182)
Popularity
1.660 (3.647)
2.302 (2.900)
2.286 (3.920)
Engagement
1.016 (1.556)
1.948 (2.324)
1.036 ( 1.575)

We induce an empirical cumulative distribution function (ECDF)
for each across different types of Enterprise Online Communities. M Rowe, M
Behaviour analysis micro feature within each community and then qualitatively analyse Hayes and curves of the functions the Web Science
Fernandez, H Alani, I Ronen, C how the M Karnstedt. In the proceedings ofdiffer across
Conference. to Community Health: Mining User Behaviour in the case of Figure 3 we see
communities. For
From User NeedsEvanston, US. (2012) instance, to Analyse Online Communities
that for Focus Dispersion Tech communities have the high-
5

6

1.0
0.4

CDF(x)

0.2
300

400

1.0

0.2

0.8

cop
team
tech

7

100

Focus Dispersion

150

200

50

100

150

0.4

200

250

cop
team
tech

Contribution

0.0
0.6

CDF(x)

0.4

0.8

250

Contribution cop
team
tech

1.0
0

0.6

CDF(x)

0.0

50

0

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
0.4

200

2

4

6

Engagement

8

10

0.0

0.0
0.6
0.4

CDF(x)

100

0.4

0

0.2

4

7

0.8

3

6

0.6

CDF(x)

0.6

0.8

1.0
0.8
0.6

1.0
0.8
1.0
5

0

Initiation

1.0

0.8

2

1.0

4

cop
Focus Dispersion
team
tech

1.0

0.0

1

3

0.2

2

0.0

0.2

CDF(x)

cop
team
tech

0.0

1

cop
team
tech

0.6

0.4

0.8

CDF(x)

0.8
0.6
0.4
0.2

0

0

0.8

CDF(x)

0.6

1.0

0.2

1.0
0.8

21

0.4

CDF(x)

User Behaviour on IBM Connections

0
Linking Users Needs to User Behaviour
22

Questionnaire questions related to different
behaviour aspects (initiation, contribution, etc.)
¨  Mapped questions to these aspects:
¨ 

¤  E.g.

Initiation questions included:

n  How

often do you ask a question?
n  How often do you create content?
n  How often do you announce work events and news?
¨ 

Resulted in average likert-scale value response per
behaviour aspect across community types

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
tries.
type responses - e.g. taking the mean of the responses for all
pre- Initiation questions for the 95 CoPs. The set of results can
The mean of the third micro feature, Contribution, is higheach
be seen in Table 4.

ck of
ars to
iffertypes
. We
d the
vel of
n. As
mmuarticution
sis to
w that
lving

est for CoPUsers Needs tohigher than the others) inLinking (but not significantly User Behaviour
Table 4. Mean andmore initiated content is interacted with than
dicating that standard deviation (in parentheses) values of micro23
features obtainedcommunities. Popularity is higher in Team and
in the other using the questionnaires for the different community
User
types
Tech Needs from Questionnairesignificantly, than in CoP, sugcommunities, but not Responses:
CoP
Tech
gesting that although users of the Team community provide
latter
Focus Dis’
4.019
more contributions, (0.093) 3.055 (0.426) 4.070 by fewer
it is with content published (0.070)
Initiation
2.483 (0.838) 2.587 (0.838) 2.243 (0.873)
users. For Engagement the mean is (1.016) 3.158 (0.945) at
Contribution 3.239 (0.926) 3.202 significantly highest < 0.001
Team indicating that users 2.104 (0.173)
Popularity - for2.875 (0.070) 3.084 (0.168)tend to participate
with more users in these communities than the others.
Engagement 2.844 (0.539) 3.027 (0.588) 2.406 (0.522)
Table 2. Mean and Standard Deviation (in parentheses) of the distribu-

Observed User Behaviour: the different community types
tion of micro features within

As Table 4 demonstrates, the findings from the analysis highly
Feature
CoP
Team
Tech
Focus with
1.682 (1.680)
1.391 to be
1.382 (1.534)
correlate Dis’ what users expressed(1.581) relevant for each
Initiation
community type.7.788 (21.525) 13.235 (23.361) 11.753 (17.182) of
We(77.607) 21.130 (72.298) 3.088 (6.676)
previously found that high levels
Contribution 26.084
Initiation and Contribution are discriminative (3.920) of
Popularity
1.660 (3.647)
2.302 (2.900)
2.286 factors
Engagement
1.016 (1.556)
1.948 (2.324)
1.036 communiTeam and CoP communities with respect to Tech ( 1.575)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
ties. Additionally, by looking at the behaviour distributions
Understanding Needs Satisfaction
24

¨ 

Agreement between users’ needs and how users
behave
¤  Reflected

by the different needs values across the
different community types

¨ 

Limitations of this approach:
1. 

Expensive to collect survey responses
n  Took

around 6 months between questionnaire publication
and results compilation
n  Required contacting many users
2. 

Implicit biases in reporting across community types
n  Team

communities had the lowest % of responses

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
25

Part III: Predicting Community Health
from User Behaviour

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Community Health and User Behaviour
26

¨ 

Management of communities is helped by:
¤  Understanding

how behaviour and health are

related
n  How

user behaviour changes are associated with health

¤  Predicting
n  Enables

¨ 

health changes

early decision making on community policy

Can we accurately detect changes in community
health from the behaviour of its users?

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Dataset 2: SAP Community Network
27

¨ 

Collection of SAP forums in which users discuss:
¤  Software

development, SAP Products, Usage of SAP tools

Points system for awarding best answers
¨  Provided with a dataset covering 33 communities:
¨ 

2004 - 2011
¤  95,200 threads, 421,098 messages, 32,942 users

Post Count

0 200

600

1000

1400

¤  Spanning

2004

2005

2006

2007

2008

2009

2010

2011

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
User Behaviour Features on SAP
28

¨ 

Focus Dispersion
¤ 

¨ 

Engagement
¤ 

¨ 

Measure: Proportion of thread replies created by the user

Initiation
¤ 

¨ 

Measure: In-degree proportioned by potential maximal in-degree

Contribution
¤ 

¨ 

Measure: Out-degree proportioned by potential maximal out-degree

Popularity
¤ 

¨ 

Measure: Forum entropy of the user

Measure: Proportion of threads that were initiated by the user

Quality
¤ 

Measure: Average points per post awarded to the user

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Inferring Roles from User Behaviour
29

¨ 

1. Construct features for community users at a given time step

¨ 

2. Derive bins using equal frequency binning
¤ 

¨ 

Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4!

3. Use skeleton rule base to construct rules using bin levels
¤ 

Popularity = low, Initiation = high -> roleA!

¤ 

Popularity < 0.5, Initiation > 0.4 -> roleA!

¨ 

4. Apply rules to infer user roles and community composition

¨ 

5. Repeat 1-4 for following time steps

Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M
Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
e as a parameter k. To judge the best model - i.e. cluster
hod and number of clusters - we measure the cohesion and
aration of a given clustering as follows: For each clustering
rithm (Ψ) we iteratively increase the number of clusters
to use where 2 ≥ k ≥ 30. At each increment of k we
rd the silhouette coefficient produced by Ψ, this is defined
a given element (i) in a given cluster as:

Mining Roles (Skeleton rule base
compilation)

30

si =
¨ 

bi − a i
max(ai , bi )

(3)

1. Select the tuning segment

0.03

0.0

0.00

0.01

0.02

Initiation

0.4
0.2

Dispersion

0.6

0.04

Where ai denotes the average distance to all other items
he same cluster and¨  i is given by calculating thebehaviour dimensions
b 2. Discover correlated average
ance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters.
each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01)
¤ 
taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equalfrequency binning
¨  3. former users into behavioural
ween −1 and 1 where the Clusterindicates a poor cluster- groups
TABLE II
where distinct items are grouped role labels for clusters
together and the latter
M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE
¨  4. Derive
ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY.
cates perfect cluster cohesion and separation. To derive
silhouette coefficient (s(Ψ(k)) for the entire clustering
Cluster
Dispersion
Initiation
Quality
Popularity
1
L
L
L
L
take the average silhouette coefficient of all items. We
0
L
M
H
L
6
L
H
M
M
that the best clustering model and number of clusters to
10
L
H
M
H
4
L
H
H
M
is K-means with 11 clusters. We found that for smaller
2,5
M
H
L
H
8,9
M
H
H
H
ter numbers (k = [3, 8]) each clustering algorithm achieves
7
H
H
L
H
3
H
H
H
H
parable performance, however as we begin to increase the
ter numbers K-means improves while the two remaining
•  1 - Focussed Novice
decision node, we measure the entropy of the dimensions and
•  2,5 - Mixed Novice
rithms produce worse cohesion and separation.
their levels across the clusters, we then choose the dimension
•  7 Distributed with
) Deriving Role Labels: -Provided Novice the most cohesive
with the largest entropy. This is defined formally as:
•  3 - Distributed Expert
separated clustering•  of users we then derive role labels
8,9 - Mixed Expert
|levels|
each cluster. Role label 0derivation first Participant inspecting
•  - Focussed Expert involves
H(dim) = −
p(level|dim) log p(level|dim)
(4)
•  - each cluster and
dimension distribution4inFocussed Expert Initiator aligning the
6 - Knowledgeable Member
level
ibution with a level • mapping (i.e. low, mid, high). This
•  10 - Knowledgeable Sink
bles the conversion of continuous dimension ranges User Behaviour to Analyse Online Communities
From User Needs to Community Health: Mining into
rete values which our rule-based approach requires in the
eton Rule Base. To perform this alignment we assess the
0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9
Cluster

0

0.010
0.000

2

0.005

4

Quality

Popularity

6

8

0.015

10

0.020

Cluster

0 1 2 3 4 5 6 7 8 9
Cluster

0 1 2 3 4 5 6 7 8 9
Cluster
Community Health Indicators
31

¨ 

From the literature there is no single agreed measure of
‘community health’
¤ 

¨ 

Indicator 1: Churn Rate (loyalty)
¤ 

¨ 

Number of active contributors

Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity)
¤ 

¨ 

Proportion of users that remain

Indicator 2: User Count (participation)
¤ 

¨ 

Emergent dimensions: loyalty, participation, activity, social capital

Replied to thread starters to non-replied to

Indicator 4: Clustering Coefficient (social capital)
¤ 

Average of users’ clustering coefficients

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Experiment 1: Health Indicator Regression
32

¨ 

¨ 

Community management is helped by understanding
the relation between behaviour and health
Experimental Setup:
¤  Health
n 

Independent vars: 9 roles with composition proportions as values @ t
n 

n 

E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc.

Dependent var: health indicator (e.g. churn rate) @ t
n 

¤  PCA
n 

Indicator Linear Regression Models (per community)

E.g. @ t = k: Churn Rate= 0.21

of each community model using the model’s coefficients

Look for a common health composition pattern

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Experiment 1: Health Indicator Regression
Results
33

50 100

Clustering Coefficient

264

200

600

PC1

¨ 

−800

−400
PC1

0

400

0

50

419

353418

−400

0

197

265
21056
413
354
412
252
270
414
420
319
198
226
470
44
418
161
264

200

PC1

353

−600

−200
PC1

Idiosyncratic Health Composition Patterns
¤ 

¨ 

−100

−200

−200
−200

50

354
161
413
414
470
210
198
420
319
4425256
226
2 270
101
412
265 56

PC2

100

50
197

44

418

101

419

0

101
2
226
412
264 126570
319
414
420
21056
470
1
1619798
413
252
354
256

PC2

418

0

50

252 197
226
319
44
270210
414
420
198
470
354
256
265
101
413
56
264

419

100

161
412

200

419

256

−50

353

PC2

0

PC2

100

353

Seeds / Non−seeds Prop

−150

User Count
300

Churn Rate

Divergence patterns between outlier communities

No general pattern exists that describes the relation
between roles and health

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

200
Experiment 2: Health Change Detection
34

¨ 

¨ 

Can we accurately and effectively detect positive and negative
changes in community health from its composition of behavioural
roles?
Experimental Setup
¤ 
¤ 
¤ 

Binary classification of indicator change using logistic regression
At t=k+1: predict increase or decrease in health indicator from t=k
Time-ordered dataset:
n 
n 
n 

¤ 

Features @ t=k+1: 9 roles with composition proportions as values
Class @ t=k+1: positive (if increase from t=k), negative (if decrease)
Divide dataset into 80/20 split maintaining time-ordering

Evaluated using Area under the ROC Curve (AUC)

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Experiment 2: Health Change Detection
Results
35

ROC Curves surpass baseline for:

0.2

0.4

0.6

FPR

0.8

1.0

1.0

0.2
0.0
0.0

0.2

0.4

0.6

FPR

0.8

1.0

0.4

0.6

0.8

1.0
TPR

0.2
0.0
0.0

Clustering Coefficient

0.8

0.8
0.6

TPR

0.4

0.8
0.6
0.4
0.0

0.2

TPR

Seeds / Non−seeds Prop

1.0

User Count

1.0

Churn Rate

0.2

¤ 

0.0

¤ 

TPR

¤ 

Churn rate: 20/25 forums
User Count: 20/25 forums
Seeds-to-Non-Seeds: 19/25 forums
Clustering Coefficient: 17/25 forums

0.6

¤ 

0.4

¨ 

0.0

0.2

0.4

0.6

FPR

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

What makes Communities Tick? Community Health Analysis using Role Compositions. M Rowe and
H Alani. In the proceedings of the Fourth IEEE International Conference on Social Computing.
Amsterdam, to Community Health: Mining User Behaviour to Analyse Online Communities
From User NeedsThe Netherlands. (2012)
36

To Summarise

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Findings
37

¨ 

User Behaviour is closely aligned with users’ needs
¤  Although

¨ 

this is expensive to collect and analyse

Accurate predictions of community health from
behaviour
¤  Inferring

roles from collective behaviour
¤  Forecasting from role compositions
¨ 

Community Managers can understand how their
community will develop from user behaviour
¤  Requires

model tuning per-community

Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M
Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
Current/Future Work: Lifecycles
38

¨ 

Limitation of role-composition approach is the use of
platform-wide windowing:
¤  Lack

¨ 

of high-fidelity behaviour inspection per-user

Lifecycles periods: user-specific stages of
development

First Post

1

2

1
#posts

3
2

=

…

Last Post

n

Divide lifetime into equal activity periods

#posts

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
users fo
by people who have contacted them before and that fewer
tently pe
novel users appear. The same is also true for the out-degree
We find
distributions: users contact fewer new people than they did
where d
before. This is symptomatic of community platforms where
the latte
despite new users arriving within the platform, users form
demonst
sub-communities in which they interact and communicate
SAP we
¨  Capture period-specific user properties (in period s):
with In-degreeindividuals. Figure 2(c) also demonstrates that
the same distribution
initially
¤ 
usersOut-degree distribution over time and thus produce a s while fo
tend to reuse language
¤ 
gradually decaying cross-entropy curve.
cross-en
¤  Term distribution
suggesti
to diverg
Facebook
SAP
This effe
Server Fault
Enabling: Churn prediction, stage-based recommendation whe
[2]
begin w
1.2

0.30

G

G

G

G

G
G
G

0

G

GGGGGGGGGGGGGGG

0.2
0.5
0.8
Lifecycle Stages

1

0.00

0.00

GG

0

G

G

GG

GG

GGG
GGG
GG
G
GG

0.2
0.5
0.8
Lifecycle Stages

1

GGG
GGGGGG
GGGGGG

0.0

Cross Entropy
0.05
0.10

Cross Entropy
0.10
0.20

G

Cross Entropy
0.4
0.8

39

0.15

User Development

0

0.2
0.5
0.8
Lifecycle Stages

1

Mining User Lifecycles from Online Community Platforms and their Application to Churn
(a) In-degree
(b) Out-degree
(c) Lexical
Prediction. M Rowe. To appear in the proceedings of the International Conference on Data
Mining. Dallas, US. (2013)
From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

Figure 2.

Cross-entropies derived from comparing users’ in-degree, out-

Inspec
concentr
platform
40

Questions?
@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

Más contenido relacionado

Destacado

Tileywoodman TEEF ESN presentation
Tileywoodman TEEF ESN presentationTileywoodman TEEF ESN presentation
Tileywoodman TEEF ESN presentationSarahJWalker
 
Motivational Drivers Of Social Networking
Motivational Drivers Of Social NetworkingMotivational Drivers Of Social Networking
Motivational Drivers Of Social NetworkingSteve Massi
 
Why people use social networking sites
Why people use social networking sitesWhy people use social networking sites
Why people use social networking sitesPetter Bae Brandtzæg
 
What User-Centered Design is Good For
What User-Centered Design is Good ForWhat User-Centered Design is Good For
What User-Centered Design is Good ForDan Saffer
 
Designing Goal-based Experiences
Designing Goal-based ExperiencesDesigning Goal-based Experiences
Designing Goal-based ExperiencesJoe Lamantia
 
Human Factors in Innovation: Designing for Adoption
Human Factors in Innovation: Designing for AdoptionHuman Factors in Innovation: Designing for Adoption
Human Factors in Innovation: Designing for AdoptionJim Kalbach
 
Translating Customer Needs Into MVPs
Translating Customer Needs Into MVPsTranslating Customer Needs Into MVPs
Translating Customer Needs Into MVPsNew York University
 
User Centred Design - Designing Better Experiences - General Assembly - April...
User Centred Design - Designing Better Experiences - General Assembly - April...User Centred Design - Designing Better Experiences - General Assembly - April...
User Centred Design - Designing Better Experiences - General Assembly - April...Matt Gibson
 
Lecture 2 - Sources of technological change
Lecture 2 - Sources of technological changeLecture 2 - Sources of technological change
Lecture 2 - Sources of technological changeUNU.MERIT
 
User centred design (UCD) and the connected home
User centred design (UCD) and the connected homeUser centred design (UCD) and the connected home
User centred design (UCD) and the connected homeCyber-Duck
 
Onderzoeksmethode scriptie
Onderzoeksmethode scriptieOnderzoeksmethode scriptie
Onderzoeksmethode scriptieMariekeStrootman
 
Social Experience Design @ Interaction 13
Social Experience Design @ Interaction 13Social Experience Design @ Interaction 13
Social Experience Design @ Interaction 13Erin 'Folletto' Casali
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Destacado (14)

Tileywoodman TEEF ESN presentation
Tileywoodman TEEF ESN presentationTileywoodman TEEF ESN presentation
Tileywoodman TEEF ESN presentation
 
Motivational Drivers Of Social Networking
Motivational Drivers Of Social NetworkingMotivational Drivers Of Social Networking
Motivational Drivers Of Social Networking
 
Why people use social networking sites
Why people use social networking sitesWhy people use social networking sites
Why people use social networking sites
 
What User-Centered Design is Good For
What User-Centered Design is Good ForWhat User-Centered Design is Good For
What User-Centered Design is Good For
 
Designing Goal-based Experiences
Designing Goal-based ExperiencesDesigning Goal-based Experiences
Designing Goal-based Experiences
 
Human Factors in Innovation: Designing for Adoption
Human Factors in Innovation: Designing for AdoptionHuman Factors in Innovation: Designing for Adoption
Human Factors in Innovation: Designing for Adoption
 
Translating Customer Needs Into MVPs
Translating Customer Needs Into MVPsTranslating Customer Needs Into MVPs
Translating Customer Needs Into MVPs
 
User Centred Design - Designing Better Experiences - General Assembly - April...
User Centred Design - Designing Better Experiences - General Assembly - April...User Centred Design - Designing Better Experiences - General Assembly - April...
User Centred Design - Designing Better Experiences - General Assembly - April...
 
Lecture 2 - Sources of technological change
Lecture 2 - Sources of technological changeLecture 2 - Sources of technological change
Lecture 2 - Sources of technological change
 
User centred design (UCD) and the connected home
User centred design (UCD) and the connected homeUser centred design (UCD) and the connected home
User centred design (UCD) and the connected home
 
Onderzoeksmethode scriptie
Onderzoeksmethode scriptieOnderzoeksmethode scriptie
Onderzoeksmethode scriptie
 
Presentation social network
Presentation social networkPresentation social network
Presentation social network
 
Social Experience Design @ Interaction 13
Social Experience Design @ Interaction 13Social Experience Design @ Interaction 13
Social Experience Design @ Interaction 13
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar a From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...ryanchard
 
The Verification Of Virtual Community Member’s Socio-Demographic Profile
The Verification Of Virtual Community Member’s Socio-Demographic ProfileThe Verification Of Virtual Community Member’s Socio-Demographic Profile
The Verification Of Virtual Community Member’s Socio-Demographic Profileacijjournal
 
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE acijjournal
 
Expectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainExpectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainIJERA Editor
 
Expectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainExpectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainIJERA Editor
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebFabrizio Orlandi
 
The eigen rumor algorithm
The eigen rumor algorithmThe eigen rumor algorithm
The eigen rumor algorithmamooool2000
 
CFMC NWLC 20100902
CFMC NWLC 20100902CFMC NWLC 20100902
CFMC NWLC 20100902Janet Shing
 
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education ContextUsing ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education Contextac2182
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18Karthikeyan Rajasekharan
 
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...Kath Straub
 
Global Redirective Practices: an online workshop for a client
Global Redirective Practices: an online workshop for a clientGlobal Redirective Practices: an online workshop for a client
Global Redirective Practices: an online workshop for a clientSean Connolly
 
Social Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsSocial Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsThe Open University
 
CFMC NWLC 20100927
CFMC NWLC 20100927CFMC NWLC 20100927
CFMC NWLC 20100927Janet Shing
 
Using Digital Badges to Recognize Co-Curricular Learning
Using Digital Badges to Recognize Co-Curricular LearningUsing Digital Badges to Recognize Co-Curricular Learning
Using Digital Badges to Recognize Co-Curricular LearningSteven Lonn
 
Big Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrandBig Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrandIntoTheMinds
 
Final social network_analysis
Final social network_analysisFinal social network_analysis
Final social network_analysisTarvinder Singh
 

Similar a From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities (20)

Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
 
The Verification Of Virtual Community Member’s Socio-Demographic Profile
The Verification Of Virtual Community Member’s Socio-Demographic ProfileThe Verification Of Virtual Community Member’s Socio-Demographic Profile
The Verification Of Virtual Community Member’s Socio-Demographic Profile
 
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
 
Expectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainExpectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application Domain
 
Expectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application DomainExpectations for Electronic Debate Platforms as a Function of Application Domain
Expectations for Electronic Debate Platforms as a Function of Application Domain
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
 
The eigen rumor algorithm
The eigen rumor algorithmThe eigen rumor algorithm
The eigen rumor algorithm
 
CFMC NWLC 20100902
CFMC NWLC 20100902CFMC NWLC 20100902
CFMC NWLC 20100902
 
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education ContextUsing ICTs to Promote Cultural Change: A Study from a Higher Education Context
Using ICTs to Promote Cultural Change: A Study from a Higher Education Context
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18
 
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...
Online Communities for Creating Change: Home Energy Pros (ACEEE 2012 Summer S...
 
Global Redirective Practices: an online workshop for a client
Global Redirective Practices: an online workshop for a clientGlobal Redirective Practices: an online workshop for a client
Global Redirective Practices: an online workshop for a client
 
Social Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsSocial Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semantics
 
Mkt 380 week 6
Mkt 380 week 6Mkt 380 week 6
Mkt 380 week 6
 
CFMC NWLC 20100927
CFMC NWLC 20100927CFMC NWLC 20100927
CFMC NWLC 20100927
 
Using Digital Badges to Recognize Co-Curricular Learning
Using Digital Badges to Recognize Co-Curricular LearningUsing Digital Badges to Recognize Co-Curricular Learning
Using Digital Badges to Recognize Co-Curricular Learning
 
Socially augmented software empowering software operation through social cont...
Socially augmented software empowering software operation through social cont...Socially augmented software empowering software operation through social cont...
Socially augmented software empowering software operation through social cont...
 
Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?
 
Big Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrandBig Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrand
 
Final social network_analysis
Final social network_analysisFinal social network_analysis
Final social network_analysis
 

Más de Matthew Rowe

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesMatthew Rowe
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Matthew Rowe
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings Matthew Rowe
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesMatthew Rowe
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersMatthew Rowe
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Matthew Rowe
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 

Más de Matthew Rowe (20)

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities

  • 1. FROM USER NEEDS TO COMMUNITY HEALTH: MINING USER BEHAVIOUR TO ANALYSE ONLINE COMMUNITIES DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK Invited Talk @ 1st Workshop on Quality, Motivation and Coordination, International Conference on Social Informatics 2013. Kyoto, Japan
  • 2. About Me 1 2002-2006: M.Eng Software Engineering 2006-2010: Ph.D. Computer Science 2010-2012: Postdoc Research Associate 2012-now: Lecturer in Social Computing Undergrad Postgrad Postdoc Lecturing Time From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 3. Research Interests 2 Semantics Social networks Digital Identity Data Forecasting + Classification Data Mining Disambiguation Automating Processes Modelling Social Systems Artificial Intelligence Machines Prediction http://scholar.google.com/citations?user=rhyR4_kAAAAJ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 4. Collaborators 3 Harith Alani. Senior Lecturer, Knowledge Media Institute, The Open University, UK. http://people.kmi.open.ac.uk/harith/ Miriam Fernandez. Research Associate, Knowledge Media Institute, The Open University, UK. http://kmi.open.ac.uk/people/member/miriamfernandez Conor Hayes. Senior Research Fellow, Digital Enterprise Research Institute, Galway, Ireland. http://www.deri.ie/users/conor-hayes Marcel Karnstedt. Senior Postdoctoral Researcher, Digital Enterprise Research Institute, Galway, Ireland. http://www.marcel.karnstedt.com/ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 5. Outline 4 ¨  Part I: Online Communities and User Behaviour ¤  define: online communities, user behaviour! ¤  The ¨  potential for examining user behaviour Part II: Comparing User Behaviour and User Needs ¤  Collecting users’ needs in online communities ¤  Linking needs to behaviour ¨  Part III: Predicting Community Health from User Behaviour ¤  Mining roles from user behaviour ¤  Community health forecasting from collective behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 6. 5 Part I: Online Communities and User Behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 7. Defining Online Communities 6 a)  Distinct user containers in which users discuss a given topic ¤  E.g. message board forums ¤  E.g. question-answering systems b)  Latent grouping of users by some common attribute ¤  E.g. semantic web community ¤  E.g. social network clusters with high social homophily ¨  This talk focuses on: a) User containers From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 8. BT (British telecommunications firm) use online communities to enable consumers to provide support to other consumers BBC News web site provides comments sections to encourage user engagement with the news Question-answering systems allow communities of ‘knowledgeable’ users to ask questions and provide answers 7 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 9. Why Provide Online Communities? 8 Increase Customer Loyalty Understanding Product Issues Facilitating Idea Generation Raising Brand Awareness Spreading through Word of Mouth From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 10. Managing Online Communities 9 ¨  Online communities incur significant investments: ¤  Hosting n  Cost and bandwidth: (time + money) grows linearly with popularity ¤  Community management: n  Settling disputes n  Encouraging engagement within the communities ¨  Common questions arise: ¤  ‘How do I know if my community is healthy?’ ¤  ‘What changes in the community lead to it becoming unhealthy’? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 11. How do I know if my community is ‘healthy’?! 10 ¨  Approach 1: Needs Satisfaction ¤  ¤  ¨  Identify users’ needs for the community Analyse users to see if their needs have been met Approach 2: Numerical Health Measures ¤  Determine suitable measures for community health (e.g. churn rate) ¤  Analyse these measures over time to see if the community is remaining healthy, or not From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 12. Analysing User Behaviour 11 ¨  Online communities are behavioural ecosystems ¤  Prevalent user behaviour can impact the behaviour of other users (Preece. 2000) ‘the way’ ‘tangible measures derived from actions performed by and upon a user’ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 13. Behaviour Features User Post Forum ‘tangible measures of actions performed by and upon a user’ Initiation ¨  ¤  The extent to which users begin discussions in a community Contribution ¨  ¤  The extent to which the user is providing content Popularity ¨  ¤  Proportion of the community that responds to the user Engagement ¨  ¤  Proportion of the community that a user responds to Focus Dispersion ¨  ¤  Variance of the user’s interests across topics Quality ¨  ¤  Reception of the user’s content by other users 12 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities +1 +1
  • 14. 13 Part II: Comparing User Behaviour and User Needs From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 15. Maslow’s Hierarchy of Needs 14 How does this hierarchy resonate with online community users? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 16. User Needs in Online Communities 15 ¨  Users have different needs for participating in an online community: ¤  To create content and share information ¤  To communicate with other users ¤  To ask questions ¤  To collaborate with other users ¤  To help other users resolve problems and issues ¤  To discuss ideas ¨  We wanted to find out how important the above needs were to community users… From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 17. Dataset 1: ¨  Enterprise social software suite ¤  Communities ¨  within enterprises Anonymised dataset (Jan 2010 -> April 2011) ¤  #Communities of Practice (CoP): 100 ¤  #Team Communities (Team): 72 ¤  #Technical Support (Tech): 14 ¨  Labels provided by (Muller et al. 2012) 16 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 18. 17 Understanding Users’ Needs on IBM Connections ¨  Surveyed 186 users about their needs ¤  Spanning the aforementioned typed communities ¤  150 responses Likert scale (1-5) for agreement with statements ¨  Examples included: ¨  ¤  How often do you do the following? n  Browse ¤  Rate for information, Search for information, etc. how important the community features are to you? n  Receiving recommendations, ability to filter information, etc. From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 19. Users Needs on IBM Connections 18 Ranked Community Features: D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani, S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 20. Users Needs on IBM Connections 19 D3.1: Report on Social, Technical and Corporate Needs in Online Communities. M Rowe, H Alani, S Angeletou and G Burel. ROBUST Deliverable 3.1. (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 21. differtypes 5. We nd the vel of n. As mmuarticbution sis to w that lving pes in n and e. For alue t type in the other communities. Popularity is higher in Team and Tech communities, but not significantly, than in CoP, suggesting that although users of the latter community provide more contributions, it is with content published by fewer users. For Engagement the mean is significantly highest - at < 0.001 - for Team indicating that users tend to participate ¨  Measured the behaviour of users across the three with more users in these communities than the others. User Behaviour on IBM Connections 20 IBM Connections community types Table 2. Mean and Standard Deviation (in parentheses) of the distribuStandard deviation Mean of of micro features within the different community types tion the behaviour feature Feature CoP Team Tech Focus Dis’ 1.682 (1.680) 1.391 (1.581) 1.382 (1.534) Initiation 7.788 (21.525) 13.235 (23.361) 3.088 (6.676) Contribution 26.084 (77.607) 21.130 (72.298) 11.753 (17.182) Popularity 1.660 (3.647) 2.302 (2.900) 2.286 (3.920) Engagement 1.016 (1.556) 1.948 (2.324) 1.036 ( 1.575) We induce an empirical cumulative distribution function (ECDF) for each across different types of Enterprise Online Communities. M Rowe, M Behaviour analysis micro feature within each community and then qualitatively analyse Hayes and curves of the functions the Web Science Fernandez, H Alani, I Ronen, C how the M Karnstedt. In the proceedings ofdiffer across Conference. to Community Health: Mining User Behaviour in the case of Figure 3 we see communities. For From User NeedsEvanston, US. (2012) instance, to Analyse Online Communities that for Focus Dispersion Tech communities have the high-
  • 22. 5 6 1.0 0.4 CDF(x) 0.2 300 400 1.0 0.2 0.8 cop team tech 7 100 Focus Dispersion 150 200 50 100 150 0.4 200 250 cop team tech Contribution 0.0 0.6 CDF(x) 0.4 0.8 250 Contribution cop team tech 1.0 0 0.6 CDF(x) 0.0 50 0 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities 0.4 200 2 4 6 Engagement 8 10 0.0 0.0 0.6 0.4 CDF(x) 100 0.4 0 0.2 4 7 0.8 3 6 0.6 CDF(x) 0.6 0.8 1.0 0.8 0.6 1.0 0.8 1.0 5 0 Initiation 1.0 0.8 2 1.0 4 cop Focus Dispersion team tech 1.0 0.0 1 3 0.2 2 0.0 0.2 CDF(x) cop team tech 0.0 1 cop team tech 0.6 0.4 0.8 CDF(x) 0.8 0.6 0.4 0.2 0 0 0.8 CDF(x) 0.6 1.0 0.2 1.0 0.8 21 0.4 CDF(x) User Behaviour on IBM Connections 0
  • 23. Linking Users Needs to User Behaviour 22 Questionnaire questions related to different behaviour aspects (initiation, contribution, etc.) ¨  Mapped questions to these aspects: ¨  ¤  E.g. Initiation questions included: n  How often do you ask a question? n  How often do you create content? n  How often do you announce work events and news? ¨  Resulted in average likert-scale value response per behaviour aspect across community types From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 24. tries. type responses - e.g. taking the mean of the responses for all pre- Initiation questions for the 95 CoPs. The set of results can The mean of the third micro feature, Contribution, is higheach be seen in Table 4. ck of ars to iffertypes . We d the vel of n. As mmuarticution sis to w that lving est for CoPUsers Needs tohigher than the others) inLinking (but not significantly User Behaviour Table 4. Mean andmore initiated content is interacted with than dicating that standard deviation (in parentheses) values of micro23 features obtainedcommunities. Popularity is higher in Team and in the other using the questionnaires for the different community User types Tech Needs from Questionnairesignificantly, than in CoP, sugcommunities, but not Responses: CoP Tech gesting that although users of the Team community provide latter Focus Dis’ 4.019 more contributions, (0.093) 3.055 (0.426) 4.070 by fewer it is with content published (0.070) Initiation 2.483 (0.838) 2.587 (0.838) 2.243 (0.873) users. For Engagement the mean is (1.016) 3.158 (0.945) at Contribution 3.239 (0.926) 3.202 significantly highest < 0.001 Team indicating that users 2.104 (0.173) Popularity - for2.875 (0.070) 3.084 (0.168)tend to participate with more users in these communities than the others. Engagement 2.844 (0.539) 3.027 (0.588) 2.406 (0.522) Table 2. Mean and Standard Deviation (in parentheses) of the distribu- Observed User Behaviour: the different community types tion of micro features within As Table 4 demonstrates, the findings from the analysis highly Feature CoP Team Tech Focus with 1.682 (1.680) 1.391 to be 1.382 (1.534) correlate Dis’ what users expressed(1.581) relevant for each Initiation community type.7.788 (21.525) 13.235 (23.361) 11.753 (17.182) of We(77.607) 21.130 (72.298) 3.088 (6.676) previously found that high levels Contribution 26.084 Initiation and Contribution are discriminative (3.920) of Popularity 1.660 (3.647) 2.302 (2.900) 2.286 factors Engagement 1.016 (1.556) 1.948 (2.324) 1.036 communiTeam and CoP communities with respect to Tech ( 1.575) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities ties. Additionally, by looking at the behaviour distributions
  • 25. Understanding Needs Satisfaction 24 ¨  Agreement between users’ needs and how users behave ¤  Reflected by the different needs values across the different community types ¨  Limitations of this approach: 1.  Expensive to collect survey responses n  Took around 6 months between questionnaire publication and results compilation n  Required contacting many users 2.  Implicit biases in reporting across community types n  Team communities had the lowest % of responses From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 26. 25 Part III: Predicting Community Health from User Behaviour From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 27. Community Health and User Behaviour 26 ¨  Management of communities is helped by: ¤  Understanding how behaviour and health are related n  How user behaviour changes are associated with health ¤  Predicting n  Enables ¨  health changes early decision making on community policy Can we accurately detect changes in community health from the behaviour of its users? From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 28. Dataset 2: SAP Community Network 27 ¨  Collection of SAP forums in which users discuss: ¤  Software development, SAP Products, Usage of SAP tools Points system for awarding best answers ¨  Provided with a dataset covering 33 communities: ¨  2004 - 2011 ¤  95,200 threads, 421,098 messages, 32,942 users Post Count 0 200 600 1000 1400 ¤  Spanning 2004 2005 2006 2007 2008 2009 2010 2011 From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 29. User Behaviour Features on SAP 28 ¨  Focus Dispersion ¤  ¨  Engagement ¤  ¨  Measure: Proportion of thread replies created by the user Initiation ¤  ¨  Measure: In-degree proportioned by potential maximal in-degree Contribution ¤  ¨  Measure: Out-degree proportioned by potential maximal out-degree Popularity ¤  ¨  Measure: Forum entropy of the user Measure: Proportion of threads that were initiated by the user Quality ¤  Measure: Average points per post awarded to the user From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 30. Inferring Roles from User Behaviour 29 ¨  1. Construct features for community users at a given time step ¨  2. Derive bins using equal frequency binning ¤  ¨  Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4! 3. Use skeleton rule base to construct rules using bin levels ¤  Popularity = low, Initiation = high -> roleA! ¤  Popularity < 0.5, Initiation > 0.4 -> roleA! ¨  4. Apply rules to infer user roles and community composition ¨  5. Repeat 1-4 for following time steps Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 31. e as a parameter k. To judge the best model - i.e. cluster hod and number of clusters - we measure the cohesion and aration of a given clustering as follows: For each clustering rithm (Ψ) we iteratively increase the number of clusters to use where 2 ≥ k ≥ 30. At each increment of k we rd the silhouette coefficient produced by Ψ, this is defined a given element (i) in a given cluster as: Mining Roles (Skeleton rule base compilation) 30 si = ¨  bi − a i max(ai , bi ) (3) 1. Select the tuning segment 0.03 0.0 0.00 0.01 0.02 Initiation 0.4 0.2 Dispersion 0.6 0.04 Where ai denotes the average distance to all other items he same cluster and¨  i is given by calculating thebehaviour dimensions b 2. Discover correlated average ance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters. each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01) ¤  taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equalfrequency binning ¨  3. former users into behavioural ween −1 and 1 where the Clusterindicates a poor cluster- groups TABLE II where distinct items are grouped role labels for clusters together and the latter M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE ¨  4. Derive ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY. cates perfect cluster cohesion and separation. To derive silhouette coefficient (s(Ψ(k)) for the entire clustering Cluster Dispersion Initiation Quality Popularity 1 L L L L take the average silhouette coefficient of all items. We 0 L M H L 6 L H M M that the best clustering model and number of clusters to 10 L H M H 4 L H H M is K-means with 11 clusters. We found that for smaller 2,5 M H L H 8,9 M H H H ter numbers (k = [3, 8]) each clustering algorithm achieves 7 H H L H 3 H H H H parable performance, however as we begin to increase the ter numbers K-means improves while the two remaining •  1 - Focussed Novice decision node, we measure the entropy of the dimensions and •  2,5 - Mixed Novice rithms produce worse cohesion and separation. their levels across the clusters, we then choose the dimension •  7 Distributed with ) Deriving Role Labels: -Provided Novice the most cohesive with the largest entropy. This is defined formally as: •  3 - Distributed Expert separated clustering•  of users we then derive role labels 8,9 - Mixed Expert |levels| each cluster. Role label 0derivation first Participant inspecting •  - Focussed Expert involves H(dim) = − p(level|dim) log p(level|dim) (4) •  - each cluster and dimension distribution4inFocussed Expert Initiator aligning the 6 - Knowledgeable Member level ibution with a level • mapping (i.e. low, mid, high). This •  10 - Knowledgeable Sink bles the conversion of continuous dimension ranges User Behaviour to Analyse Online Communities From User Needs to Community Health: Mining into rete values which our rule-based approach requires in the eton Rule Base. To perform this alignment we assess the 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Cluster 0 0.010 0.000 2 0.005 4 Quality Popularity 6 8 0.015 10 0.020 Cluster 0 1 2 3 4 5 6 7 8 9 Cluster 0 1 2 3 4 5 6 7 8 9 Cluster
  • 32. Community Health Indicators 31 ¨  From the literature there is no single agreed measure of ‘community health’ ¤  ¨  Indicator 1: Churn Rate (loyalty) ¤  ¨  Number of active contributors Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity) ¤  ¨  Proportion of users that remain Indicator 2: User Count (participation) ¤  ¨  Emergent dimensions: loyalty, participation, activity, social capital Replied to thread starters to non-replied to Indicator 4: Clustering Coefficient (social capital) ¤  Average of users’ clustering coefficients From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 33. Experiment 1: Health Indicator Regression 32 ¨  ¨  Community management is helped by understanding the relation between behaviour and health Experimental Setup: ¤  Health n  Independent vars: 9 roles with composition proportions as values @ t n  n  E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc. Dependent var: health indicator (e.g. churn rate) @ t n  ¤  PCA n  Indicator Linear Regression Models (per community) E.g. @ t = k: Churn Rate= 0.21 of each community model using the model’s coefficients Look for a common health composition pattern From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 34. Experiment 1: Health Indicator Regression Results 33 50 100 Clustering Coefficient 264 200 600 PC1 ¨  −800 −400 PC1 0 400 0 50 419 353418 −400 0 197 265 21056 413 354 412 252 270 414 420 319 198 226 470 44 418 161 264 200 PC1 353 −600 −200 PC1 Idiosyncratic Health Composition Patterns ¤  ¨  −100 −200 −200 −200 50 354 161 413 414 470 210 198 420 319 4425256 226 2 270 101 412 265 56 PC2 100 50 197 44 418 101 419 0 101 2 226 412 264 126570 319 414 420 21056 470 1 1619798 413 252 354 256 PC2 418 0 50 252 197 226 319 44 270210 414 420 198 470 354 256 265 101 413 56 264 419 100 161 412 200 419 256 −50 353 PC2 0 PC2 100 353 Seeds / Non−seeds Prop −150 User Count 300 Churn Rate Divergence patterns between outlier communities No general pattern exists that describes the relation between roles and health From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities 200
  • 35. Experiment 2: Health Change Detection 34 ¨  ¨  Can we accurately and effectively detect positive and negative changes in community health from its composition of behavioural roles? Experimental Setup ¤  ¤  ¤  Binary classification of indicator change using logistic regression At t=k+1: predict increase or decrease in health indicator from t=k Time-ordered dataset: n  n  n  ¤  Features @ t=k+1: 9 roles with composition proportions as values Class @ t=k+1: positive (if increase from t=k), negative (if decrease) Divide dataset into 80/20 split maintaining time-ordering Evaluated using Area under the ROC Curve (AUC) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 36. Experiment 2: Health Change Detection Results 35 ROC Curves surpass baseline for: 0.2 0.4 0.6 FPR 0.8 1.0 1.0 0.2 0.0 0.0 0.2 0.4 0.6 FPR 0.8 1.0 0.4 0.6 0.8 1.0 TPR 0.2 0.0 0.0 Clustering Coefficient 0.8 0.8 0.6 TPR 0.4 0.8 0.6 0.4 0.0 0.2 TPR Seeds / Non−seeds Prop 1.0 User Count 1.0 Churn Rate 0.2 ¤  0.0 ¤  TPR ¤  Churn rate: 20/25 forums User Count: 20/25 forums Seeds-to-Non-Seeds: 19/25 forums Clustering Coefficient: 17/25 forums 0.6 ¤  0.4 ¨  0.0 0.2 0.4 0.6 FPR 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR What makes Communities Tick? Community Health Analysis using Role Compositions. M Rowe and H Alani. In the proceedings of the Fourth IEEE International Conference on Social Computing. Amsterdam, to Community Health: Mining User Behaviour to Analyse Online Communities From User NeedsThe Netherlands. (2012)
  • 37. 36 To Summarise From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 38. Findings 37 ¨  User Behaviour is closely aligned with users’ needs ¤  Although ¨  this is expensive to collect and analyse Accurate predictions of community health from behaviour ¤  Inferring roles from collective behaviour ¤  Forecasting from role compositions ¨  Community Managers can understand how their community will develop from user behaviour ¤  Requires model tuning per-community Community Analysis through Semantic Rules and Role Composition Derivation. M Rowe, M Fernandez, S Angeletou and H Alani. In the Journal of Web Semantics (2012) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 39. Current/Future Work: Lifecycles 38 ¨  Limitation of role-composition approach is the use of platform-wide windowing: ¤  Lack ¨  of high-fidelity behaviour inspection per-user Lifecycles periods: user-specific stages of development First Post 1 2 1 #posts 3 2 = … Last Post n Divide lifetime into equal activity periods #posts From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities
  • 40. users fo by people who have contacted them before and that fewer tently pe novel users appear. The same is also true for the out-degree We find distributions: users contact fewer new people than they did where d before. This is symptomatic of community platforms where the latte despite new users arriving within the platform, users form demonst sub-communities in which they interact and communicate SAP we ¨  Capture period-specific user properties (in period s): with In-degreeindividuals. Figure 2(c) also demonstrates that the same distribution initially ¤  usersOut-degree distribution over time and thus produce a s while fo tend to reuse language ¤  gradually decaying cross-entropy curve. cross-en ¤  Term distribution suggesti to diverg Facebook SAP This effe Server Fault Enabling: Churn prediction, stage-based recommendation whe [2] begin w 1.2 0.30 G G G G G G G 0 G GGGGGGGGGGGGGGG 0.2 0.5 0.8 Lifecycle Stages 1 0.00 0.00 GG 0 G G GG GG GGG GGG GG G GG 0.2 0.5 0.8 Lifecycle Stages 1 GGG GGGGGG GGGGGG 0.0 Cross Entropy 0.05 0.10 Cross Entropy 0.10 0.20 G Cross Entropy 0.4 0.8 39 0.15 User Development 0 0.2 0.5 0.8 Lifecycle Stages 1 Mining User Lifecycles from Online Community Platforms and their Application to Churn (a) In-degree (b) Out-degree (c) Lexical Prediction. M Rowe. To appear in the proceedings of the International Conference on Data Mining. Dallas, US. (2013) From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities Figure 2. Cross-entropies derived from comparing users’ in-degree, out- Inspec concentr platform
  • 41. 40 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ From User Needs to Community Health: Mining User Behaviour to Analyse Online Communities