Automatic Persona Generation: Introduction & Current Challenges
1. Introduction & Current Challenges
Dr. Joni Salminen
September 30, 2021
IT University Copenhagen
Automatic Persona Generation
2. Meet the APG Team!
Professor Jim Jansen
The Leader (Principal Scientist)
• Inventor of APG
• Leads the project
• Customer relationships &
management
MSc. Soon-gyo Jung
The Genius (Software Engineer)
• Creator of APG
• Front-End / Back-End
• Implements like a genius, hence
the nickname
Dr. Joni Salminen
The Handyman (Scientist)
• Helps with user studies,
system development, etc.
• Strategic guy, likes to think the
big picture
3. Giving faces to user data?
• Personas…
• Summarize relevant user information for decision makers that need that
information
• Are an alternative (or complement) to numbers
• Provide a different way of doing user/customer analytics (more
approachable & memorable)
…are not just about visualization, but empathetic
representations of users!
Nielsen, L. (2019). Personas—User Focused Design (2nd ed.
2019 edition). Springer.
4. Literally, faces!
Personification = nameless, faceless
segments are turned into personas that
describe a behavioral and demographic
pattern in the data
Enrichment = enriching the persona
profiles with additional information such
as sentiment, loyalty, quotes, most
viewed content, and topics of interest
5. The process relies on data
dimensionality reduction
(Non-negative matrix
factorization
Jung, S., Salminen, J., Kwak, H., An, J., & Jansen, B. J. (2018). Automatic
Persona Generation (APG): A Rationale and Demonstration. CHIIR ’18:
Proceedings of the 2018 Conference on Human Information Interaction &
Retrieval, 321–324. https://doi.org/10.1145/3176349.3176893
6.
7. Three ways in which “Personified Big Data”
drives the automation of personas
1. Access to online analytics and social media platforms via
application programming interfaces (APIs) for end-user data
2. Standardized format of aggregated end-user data (engagement
metrics, demographic groups)
3. Data analysis algorithms, libraries and software tools that enable
automation of whole pipeline from data collection to persona
generation to serving via interactive persona systems (end-to-end).
Salminen, J., Guan, K., Jung, S.-G., & Jansen, B. J. (2021). A
Survey of 15 Years of Data-Driven Persona Development.
International Journal of Human–Computer Interaction, 0(0), 1–24.
https://doi.org/10.1080/10447318.2021.1908670
8.
9. Why automate persona generation?
Personas are usually created with manual methods (i.e.,
interviews & ethnography), methods that are expensive
and slow to implement, and they can quickly become
outdated. Because of the limitations, personas risk being
inaccurate representations of the true user base.
Better
personas
Better
decisions
Better
results.
In contrast, APG provides personas that are fast to
create and updated automatically. This means the cost of
persona creation is dramatically reduced, making them
available for organizations with limited means (e.g.,
startups, small businesses). Depending on the underlying
dataset, APG can cover a wide range of behaviors and
demographics.
Manual methods
Automation
An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People
Representing Real Numbers: Generating Personas from Online Social Media Data.
ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986
10. The brief history of data-driven personas (1999-2021)
2006: Mulder & Yaar
Defined “Quantitative
Personas” and different
method types (also Grudin
and Pruitt had done in 2002
and 2003)
2008: McGinn & Kotamraju
“Data-Driven Persona Development”
• Provides statistical validation
• Drawback: survey data
1999: Cooper
Establishes the need for
personas in software
development, design, and HCI
2015: Zhang et al.
“ Clickstream Personas”
• Used click data (online analytics)
• Drawback: superficial personas
(no demographics)
2016: An et al.
“Automatic Persona Generation”
• Introduces social media data for persona
generation (both text and numbers)
• Introduces plans and vision for a system
• Drawbacks: many observed challenges
2017: Jung et al.
“Automatic Persona Generation”
• Introduces an interactive persona
system using an ML pipeline and
Web technologies
• Drawbacks: many observed
challenges
2021: Salminen et al.
“Persona Analytics”
Introduces eye- and mouse-
tracking of persona users as a
method for producing
knowledge for persona science
2021: Jansen et al.
“Data-Driven Personas: The Book”
• Summarizes five years of academic
research and system development
• Defines a roadmap for the future
12. Research Roadmap for
Automatic Persona Generation (APG)
Information architecture:
How to choose relevant
persona information content
and presentation for a given
user, use case, and
industry?
Quotes:
How to find demographically
matching, non-toxic comments
that describe the persona’s
attitudes and are relevant for
end users?
Temporal analysis:
How to analyze change
of personas over time?
APG is about finding better ways to process and choose
useful user information from vast amounts of online data.
”Personas are about giving faces to data.”
Applicability: How to create
personas for specific industries
(e.g., e-health, e-commerce,
politics, gaming…)?
Image: How to
automatically generate, tag,
and choose appropriate
persona profile pictures?
Evaluation: (1) How to ensure
personas are of high quality
(complete, clear, consistent and
credible)? (2) How to measure
usefulness of personas for
individuals and organizations?
Attributes & Topics of
Interest: How to automatically
infer user attributes, such as
interests, needs, wants, goals,
political orientation, and brand
affinity from social media?
Salminen, J., Jansen, B. J., An, J., Kwak, H., &
Jung, S. (2019). Automatic Persona Generation
for Online Content Creators: Conceptual
Rationale and a Research Agenda. In L. Nielsen
(Ed.), Personas—User Focused Design (2nd ed.,
pp. 135–160). Springer London.
https://doi.org/10.1007/978-1-4471-7427-1_8
13. APG’s links to Computer Science
Challenge Potential solutions
Image Generative Adversarial Networks (GANs)
Persona Attributes Text Classification, Topic Modeling (LDA)
Quotes Hate Speech Detection, Natural Language Processing (NLP)
Persona Change Anomaly Detection, Concept Drift, Similarity Metrics, Tensor
Factorization (TF)…
Information Architecture User Studies, Crowd Experiments, Human-Computer Interaction
(HCI), Adaptive / Intelligent Systems, User Modeling, Information
Science (IS)
Persona Evaluation Factor Analysis, Structural Equation Modeling (SEM), Experiments,
User Experience (UX), Usability and User Interface (UI) Design
14. Issues about Pictures
• Need for manual
supervision / validation
• Demographically
imbalanced datasets
• Currently conditional
generation is not
supported
15. Salminen, J., Jung, S., Kamel, A. M. S., Santos, J. M., & Jansen, B. J. (2020). Using artificially generated pictures in
customer-facing systems: An evaluation study with data-driven personas. Behaviour & Information Technology, 0(0), 1–
17. https://doi.org/10.1080/0144929X.2020.1838610
16. Issues about Algorithm
• Is clustering or dimensionality reduction meaningful for user
segmentation in the first place?
• From a diversity standpoint, it seems no
• Diversity maximization or using diversity as a goal has been
largely ignored in user segmentation and persona creation
• …how many personas should be created? (Depends on the
goal: what is the goal??)
• What algorithm performs the best? And, what METRIC is the
most appropriate (e.g., statistical distance vs. diversity)
17. Issues about Algorithm
• Concept drift / topic drift / model drift…
• All refer to CHANGE in the underlying user behavior (data)
• How often should personas be changed? How should the
change be measured / detected?
18. Issues about Quotes
• Bødker’s ”Frankenstein problem”: inconsistency of persona
information
• How to match the quotes with the personas’ demographics?
• Inconvenient cases: man woman, Indian Pakistanese,
etc. (cultural sensibilities (Häkkilä et al.))
19. Data is available but what about
information?
• Attitudes, fears, doubts, hopes, needs, wants… can these be
inferred from numbers?
• Tweets contain a lot… Rosetta’s Stone for data-driven
personas: user modeling / soft attribute inference from smartly
sampled tweets
• …even more important because persona users’ information
needs are unique --- need to have flexible tools for them to
query persona attitudes in real-time (static data-driven personas
won’t do)
20. Towards persona science?
• Persona analytics = how decision-makers (i.e.,
persona users) in organizations use personas as
analytical tools to better understand their users or
customers.
• Persona analytics = how persona creators or
researchers investigate the behaviors of persona
users.
We define ‘persona analytics’ (PA) as the systematic
measurement of behaviors and interactions of persona
users engaged with interactive persona systems. When
personas are provided through a web browser, PA takes
place via mouse- (and eye-)tracking that records the
persona users’ mouse (or gaze) movements and clicks
(eye fixations) on the provided persona profiles and
their information elements.
21. Empirical Persona User Research
(1) How do users interact with personas?
(2) What persona information do users pay attention to?
(3) What persona information causes users to change/reinforce their
attitudes?
(4) What persona information influences users’ decision making and
how?
(5) How and why do users choose a persona for their task?
→ Unified theory of personas? Jung, S., Salminen, J., & Jansen, B. J.
(2021). Persona Analytics: Implementing
Mouse-tracking for an Interactive Persona
System. Extended Abstracts of ACM Human
Factors in Computing Systems - CHI EA ’21.
23. Jung, S., Salminen, J., & Jansen, B. J. (2021). Persona
Analytics: Implementing Mouse-tracking for an
Interactive Persona System. Extended Abstracts of ACM
Human Factors in Computing Systems - CHI EA ’21.
24. Next steps
• Metrics
• What measures and
metrics we want to
analyze?
• Hypotheses
• Intervention →
expected change in
persona users’
behavior
Persona-based metrics User-based metrics
Time spent per persona =
Number of visits per persona =
Persona revisit frequency =
Number of personas visited =
Persona coverage =
Persona visit distribution =
Rank correlation =
Table 1: Persona Analytics metrics.
Behavioral matters such as order effects, revisit
frequency, persona comparisons, satisficing behavior, and
choice can be investigated deploying the persona state-
transition matrix and Markov Chain techniques. Persona
information design can be informed by dwell time
analyses, and typical persona viewing patterns and
information viewing patterns can be deduced in interactive
persona user studies using a live system.
Jung, S., Salminen, J., & Jansen, B. J. (2021). Persona Analytics:
Implementing Mouse-tracking for an Interactive Persona System. Extended
Abstracts of ACM Human Factors in Computing Systems - CHI EA ’21.
25. Data-driven personas have room for all
lines of research
• Algorithmically oriented people can solve algorithmic problems in
generation, validation, updating, etc.
• Qualitatively oriented research can carry out user studies (e.g.,
observation, interviews)
• Empirically oriented researchers can conduct experiments using real
systems and controlled conditions
• Theoretically oriented scholars can attempt to formulate theories of
persona use and persona-user interaction
…join the family ☺
26. Thank you!
Dr. Joni Salminen
jsalminen@hbku.edu.qa
The APG family (2019)
Get the book from Amazon! (or your library)