Link to article: https://www.springerprofessional.de/en/combining-behaviors-and-demographics-to-segment-online-audiences/16204306
CITE: Jansen, Bernard J., Jung, S., Salminen, J., An, J. and Kwak, H. (2018), “Combining Behaviors and Demographics to Segment Online Audiences: Experiments with a YouTube Channel”, Proceedings of the 5th International Conference of Internet Science (INSCI 2018), Springer, St. Petersburg, Russia.
Link to Automatic Persona Generation: https://persona.qcri.org
Falcon Invoice Discounting: Unlock Your Business Potential
Combining Behaviors and Demographics to Segment Online Audiences:Experiments with a YouTube Channel
1. Bernard J. Jansen1, Soon-gyo Jung1, Joni Salminen1,2, Jisun An1,
Haewoon Kwak1
1Hamad Bin Khalifa University, Doha, Qatar
2 University of Turku, Turku, Finland
bjansen@hbku.edu.qa, sjung@hbku.edu.qa, jsalminen@hbku.edu.qa,
jan@hbku.edu.qa, hkwak@hbku.edu.qa
5th International conference ‘Internet Science’ (INSCI'2018)
St. Petersburg, Russia, October 24 to 26, 2018.
Combining Behaviors and
Demographics to Segment
Online Audiences:
Experiments with a YouTube Channel
2. Problem
Read also: Salminen, J., Milenković, M. and Jansen, B.J.
(2017), “Problems of Data Science in Organizations: An
Explorative Qualitative Analysis of Business Professionals’
Concerns”, Proceedings of International Conference on
Electronic Business (ICEB 2017), Dubai.
3. Problem
• Social media channels with audiences in the millions are
increasingly common.
• Efforts at segmenting audiences for populations of these
sizes can result in hundreds of audience segments, as
the compositions of the overall audiences tend to be
complex.
• Although understanding audience segments is
important for strategic planning, tactical decision
making, and content creation, it is unrealistic for
human decision makers to effectively utilize
hundreds of audience segments in these tasks.
4. Problem
• In this research, we present efforts at simplifying the
segmentation of audience populations to increase
their practical utility.
• Using millions of interactions with hundreds of
thousands of viewers with an organization’s online
content collection, we first conduct behavioral
profiling, and then demographic profiling.
…the outcomes are implemented in real online system
for automatic persona generation.
5. Data collection
• Our data source is the online news channel AJ+ , which was
designed from its founding to serve news in the medium of the
viewer, with no redirect to a website or other platform.
• For this research, we use the AJ+ YouTube Channel.
However, the technique presented here is generalizable to any
social media channel providing aggregated audience statistics.
• We collect data on 4,320 YouTube videos produced from June
13, 2014 to July 27, 2016.
• Collectively, these videos have had more than 30 million views
from people in 190 countries at the time of the data collection.
• The YouTube analytics platform provides, for each video, view
counts by user group (age, gender, country).
• We access the data via the YouTube application programming
interface (API), through which data can be collected
automatically if permission is granted by the channel owner.
6. Data structure
Type Description
Demographic
attributes
ageGroup: YouTube viewers are classified into multiple age
categories (13-17, 18-24, 25-34, 35-44, 45-54, 55-64, and 65
years and older); 7 possible age categories for a customer.
gender: YouTube viewers are classified as either male or
female, so there are 2 possible categories.
country: YouTube uses the two-letter ISO-3166-1 country
code index to classify where viewers are from, with 249
current officially assigned country codes at the time of this
study.
Behavioral
attribute
viewCount: YouTube provides the number of views per
video for a given [country] by [gender, ageGroup].
7. Matrix decomposition
For each content piece and
aggregated user group, we
retrieve a value of the chosen
interaction metric. Then we
decompose this matrix V into W
and H in regards to p, which is
the number of segments [1].
[1] An, J., Kwak, H., & Jansen, B. J. (2017). Personas for
Content Creators via Decomposed Aggregate Audience
Statistics. In Proceedings of Advances in Social Network
Analysis and Mining (ASONAM 2017). Sydney, Australia.
8. Overview of segmentation
• We decompose (i.e., separate into simpler components) the matrix V into two
matrices: W and H.
• Matrix H contains the audience behaviors and matrix W contains the user
demographics.
• Matrix H encodes an association between the behavioral segments and the
individual videos. Once we have the matrix H, we discover, via NMF, the
latent patterns describing the user segment interactions with the videos.
• The matrix W encodes an association between user demographic groups and
behavioral user segments with the set of corresponding user demographic
segments each with an associated weight indicating how strongly each
demographic segment is associated with the given behavioral segment.
• Each row in W represents how each demographic segment can be
characterized by different behavioral patterns. The columns in W show how
strongly each latent behavioral pattern is associated with different
demographic segments.
9. Example
User Behavioral Segment #1 (B1)
Seg. Country Age Gender Weight z-score
D1 US 25 male 415.73 2.330935
D2 US 35 male 313.00 1.215947
D3 US 45 male 221.56 0.223497
D4 CA 25 male 216.42 0.167709
D5 CA 35 male 176.54 -0.26513
D6 US 55 male 165.83 -0.38137
D7 GB 25 male 147.47 -0.58064
D8 CA 45 male 124.21 -0.8331
D9 US 65 male 114.93 -0.93382
D10 GB 35 male 113.99 -0.94402
10. Simplification heuristics
• For this research, we use 15 behavioral segments, each associated with 10
demographic segments. Therefore, we have a total of 150 potential
segments, which is the number we aim to reduce.
• Using elbow and z-score approaches, we reduce our entire data set of 150
audience segments to 63 segments (a 58% reduction).
11. Simplification heuristics
• For this research, we use 15 behavioral segments, each associated with 10
demographic segments. Therefore, we have a total of 150 potential
segments, which is the number we aim to reduce.
• Using elbow and z-score approaches, we reduce our entire data set of 150
audience segments to 63 segments (a 58% reduction).
• Country heuristic: If two or more demographic segments are from the same country, then
those audience segments are candidates for consolidation
• Gender heuristic: If demographic segments are of the same gender from the same country,
those segments are candidates for consolidation
• Age heuristic: (a) if the age bracket is bounded by corresponding age brackets from the same
country and gender, then the age brackets are aggregated, or (b) if the age bracket is adjacent to
a corresponding age bracket from the same country and gender, then the two age brackets are
aggregated.
12. Simplification heuristics
• For this research, we use 15 behavioral segments, each associated with 10
demographic segments. Therefore, we have a total of 150 potential
segments, which is the number we aim to reduce.
• Using elbow and z-score approaches, we reduce our entire data set of 150
audience segments to 63 segments (a 58% reduction).
• Country heuristic: If two or more demographic segments are from the same country, then
those audience segments are candidates for consolidation
• Gender heuristic: If demographic segments are of the same gender from the same country,
those segments are candidates for consolidation
• Age heuristic: (a) if the age bracket is bounded by corresponding age brackets from the same
country and gender, then the age brackets are aggregated, or (b) if the age bracket is adjacent to
a corresponding age bracket from the same country and gender, then the two age brackets are
aggregated.
• When applying the country, gender, and age consolidations to each of the 15
behavioral segments, our original 150 integrated audience segments are reduced
to 42 segments, a 72% reduction.
13. Advantages
1. Simplifies behaviorally and demographically complex
customer base: With the example dataset, we achieved a
72% simplification of the segments
2. Demographic groups can belong to many behavioral
groups (unlike in clustering)
3. Behavior is the basis for segmentation (as
recommended in state-of-the-art marketing theory)
4. The approach is generalizable to any customer data
with age, gender, location and interaction metric
(e.g., social media data, in-house data)
14. How do we actually utilize the
results of this research?
16. See the personas
Persona Listing based on selections. Clicking on
persona image displays persona profile.
17. Learn about the personas
Persona Profile, with image, name, attributes, topics,
most interested content, audience size, etc.
18. View changes in personas over time
Shows the similarities and changes in the persona set
over time.
19. View how personas interact with content
Analyze how well your content interests
different segments.
20. Identify market opportunities
Identify gaps in current and potential audience.
The share
of our
audience is
like this
The share of
total
audience in
the world is
like this
21. Configuration
Collection
Generation
A matrix of
content interaction patterns
Automatically generated
personas
Collection/Generation/API
information
See: http://persona.qcri.org
Read: Jung, S., Salminen, J., An, J., Kwak, H. and Jansen, B.J.
(2018), “Automatically Conceptualizing Social Media Analytics Data
via Personas”, presented at the International AAAI Conference on Web
and Social Media (ICWSM 2018), San Francisco, California, USA.