Dreaming Music Video Treatment _ Project & Portfolio III
Human-centered AI: how can we support lay users to understand AI?
1. Human-centered AI: how can we support lay
users to understand AI?
NEC labs Europe - 24 Oct 2022
Katrien Verbert
Augment/HCI - KU Leuven
@katrien_v
2. Human-Computer Interaction group
Explainable AI - recommender systems – visualization – intelligent user interfaces
Learning analytics &
human resources
Media
consumption
Precision agriculture
Healthcare
Augment Katrien Verbert
ARIA Adalberto Simeone
Computer
Graphics
Phil Dutré
LIIR Sien Moens
E-media
Vero Vanden Abeele
Luc Geurts
Kathrin Gerling
3. Augment/HCI team
Robin De Croon
Postdoc researcher
Katrien Verbert
Professor
Tom Broos
PhD researcher
Houda Lamqaddam
Postdoc researcher
Oscar Alvarado
Postdoc researcher
https://augment.cs.kuleuven.be/
Diego Rojo Carcia
PhD researcher
Maxwell Szymanski
PhD researcher
Jeroen Ooge
PhD researcher
Aditya Bhattacharya
PhD researcher
Ivania Donoso Guzmán
PhD researcher
3
4. q Explaining model outcomes to increase user trust and acceptance
q Enable users to interact with the explanation process to improve the model
Research objectives
Models
11. Explanations
11
Millecamp, M., Htun, N. N., Conati, C., & Verbert, K. (2019, March). To explain or not to explain:
the effects of personal characteristics when explaining music recommendations. In
Proceedings of the 2019 Conference on Intelligent User Interface (pp. 397-407). ACM.
12. Personal characteristics
Need for cognition
•Measurement of the tendency for an individual to engage in, and enjoy, effortful cognitive
activities
•Measured by test of Cacioppo et al. [1984]
Visualisation literacy
•Measurement of the ability to interpret and make meaning from information presented in the form
of images and graphs
•Measured by test of Boy et al. [2014]
Locus of control (LOC)
•Measurement of the extent to which people believe they have power over events in their lives
•Measured by test of Rotter et al. [1966]
Visual working memory
•Measurement of the ability to recall visual patterns [Tintarev and Mastoff, 2016]
•Measured by Corsi block-tapping test
Musical experience
•Measurement of the ability to engage with music in a flexible, effective and nuanced way
[Müllensiefen et al., 2014]
•Measured using the Goldsmiths Musical Sophistication Index (Gold-MSI)
Tech savviness
•Measured by confidence in trying out new technology 12
13. User study
¤ Within-subjects design: 105 participants recruited with Amazon Mechanical Turk
¤ Baseline version (without explanations) compared with explanation interface
¤ Pre-study questionnaire for all personal characteristics
¤ Task: Based on a chosen scenario for creating a play-list, explore songs and
rate all songs in the final playlist
¤ Post-study questionnaire:
¤ Recommender effectiveness
¤ Trust
¤ Good understanding
¤ Use intentions
¤ Novelty
¤ Satisfaction
¤ Confidence
15. Design implications
¤ Explanations should be personalised for different groups of
end-users.
¤ Users should be able to choose whether or not they want to
see explanations.
¤ Explanation components should be flexible enough to present
varying levels of details depending on a user’s preference.
15
16. User control
Users tend to be more satisfied when they have control over
how recommender systems produce suggestions for them
Control recommendations
Douban FM
Control user profile
Spotify
Control algorithm parameters
TasteWeights
17. Controllability Cognitive load
Additional controls may increase cognitive load
(Andjelkovic et al. 2016)
Ivana Andjelkovic, Denis Parra, andJohn O’Donovan. 2016. Moodplay: Interactive mood-based
music discovery and recommendation. In Proc. of UMAP’16. ACM, 275–279.
18. Different levels of user control
18
Level
Recommender
components
Controls
low
Recommendations
(REC)
Rating, removing, and
sorting
medium User profile (PRO)
Select which user profile
data will be considered by
the recommender
high
Algorithm parameters
(PAR)
Modify the weight of
different parameters
Jin, Y., Tintarev, N., & Verbert, K. (2018, September). Effects of personal characteristics on music
recommender systems with different levels of controllability. In Proceedings of the 12th ACM Conference
on Recommender Systems (pp. 13-21). ACM.
19. User profile (PRO) Algorithm parameters (PAR) Recommendations (REC)
8 control settings
No control
REC
PAR
PRO
REC*PRO
REC*PAR
PRO*PAR
REC*PRO*PAR
20. Evaluation method
¤ Between-subjects – 240 participants recruited with AMT
¤ Independent variable: settings of user control
¤ 2x2x2 factorial design
¤ Dependent variables:
¤ Acceptance (ratings)
¤ Cognitive load (NASA-TLX), Musical Sophistication, Visual Memory
¤ Framework Knijnenburg et al. [2012]
21. Results
¤ Main effects: from REC to PRO to PAR → higher cognitive
load
¤ Two-way interaction: does not necessarily result in higher
cognitive load. Adding an additional control component
to PAR increases the acceptance. PRO*PAR has less
cognitive load than PRO and PAR
¤ High musical sophistication leads to higher quality, and
thereby result in higher acceptance
21
Jin, Y., Tintarev, N., & Verbert, K. (2018, September). Effects of personal characteristics on music
recommender systems with different levels of controllability. In Proceedings of the 12th ACM
Conference on Recommender Systems (pp. 13-21). ACM.
25. Explaining exercise recommendations
How to automatically
adapt the exercise
recommending on Wiski to
the level of students?
How do (placebo)
explanations affect initial
trust in Wiski for
recommending exercises?
Goals and research questions
Automatic
adaptation
Explanations & trust
Young target
audience
Middle and high school
students
Ooge, J., Kato, S., Verbert, K. (2022) Explaining Recommendations in E-Learning: Effects on
Adolescents' Initial Trust. Proceedings of the 27th IUI conference on Intelligent User Interfaces
26. Results: Real explanations…
… did increase multidimensional initial trust
… did not increase one-dimensional initial trust
… led to accepting more recommended exercises
compared to both placebo and no explanations
27. Results: Placebo explanations…
… did not increase initial trust compared to no
explanations
… may undermine perceived integrity
… are a useful baseline:
• how critical are students towards explanations?
• how much transparency do students need?
28. Results: No explanations
Can be acceptable in low-stakes situations (e.g.,
drilling exercises):
indications of difficulty level might suffice
Personal level
indication: Easy,
Medium and Hard tags
30. 30
uncertainty
Gutiérrez Hernández F., Seipp K., Ochoa X., Chiluiza K., De Laet T., Verbert K. (2018). LADA: A
learning analytics dashboard for academic advising. Computers in Human Behavior, pp 1-13. doi:
10.1016/j.chb.2018.12.004
LADA: a learning analytics dashboard
for study advisors
33. AHMoSe
Rojo, D., Htun, N. N., Parra, D., De Croon, R., & Verbert, K. (2021). AHMoSe: A knowledge-based visual
support system for selecting regression machine learning models. Computers and Electronics in
Agriculture, 187, 106183.
35. Case Study – Grape Quality Prediction
35
¤ Grape Quality Prediction Scenario
[Tag14]
¤ Data
¤ Years 2010, 2011 (train) 2012 (test)
¤ 48 cells (Central Greece)
¤ Knowledge-based rules
[Tag14] Tagarakis, A., et al. "A fuzzy inference system to model grape
quality in vineyards." Precision Agriculture 15.5 (2014): 555-578. Source: [Tag14]
36. Simulation Study
¤ AHMoSe vs full AutoML approach to support model
selection.
36
RMSE (AutoML) RMSE (AHMoSe) Difference %
Scenario A
Complete
Knowledge
0.430 0.403 ▼ 6.3%
Scenario B
Incomplete
Knowledge
0.458 0.385 ▼ 16.0%
37. Qualitative Evaluation
¤ 10 open ended questions
¤ 5 viticulture experts and 4 ML experts.
¤ Thematic Analysis: potential use cases, trust, usability,
and understandability.
38. Qualitative Evaluation - Trust
38
¤ Showing the dis/agreement of model outputs with
expert’s knowledge can promote trust.
“The thing that makes us trust the models is the fact that most of the
time, there is a good agreement between the values predicted by the
model and the ones obtained for the knowledge of the experts.”
– Viticulture Expert
40. Designing for interacting with
predictions for finding jobs
40
Key Issues: Missing data, prediction trust issues, job
seeker motivation, lack of control.
41. Methods
¤ A Customer Journey approach. (5 mediators).
¤ Hands-on time with the original dashboard (22 mediators).
¤ Observations of mediation sessions. (3 mediators, 6 job seekers).
¤ Questionnaire regarding perception of the dashboard and
prediction model (15 Mediators).
41
Charleer S., Gutiérrez Hernández F., Verbert K. (2018). Supporting job mediator and job seeker
through an actionable dashboard. In: Proceedings of the 24th IUI conference on Intelligent User
Interfaces Presented at the ACM IUI 2019, Los Angeles, USA.
43. Take away messages
¤ Key difference between actionable and non-actionable
parameters
¤ Need for customization and contextualization.
¤ The human expert plays a crucial role when interpreting
and relaying in the predicted or recommended output.
43
Charleer S., Gutiérrez Hernández F., Verbert K. (2019). Supporting job mediator and job
seeker through an actionable dashboard. In: Proceedings of the 24th IUI conference on
Intelligent User Interfaces Presented at the ACM IUI 2019, Los Angeles, USA. (Core: A)
48. Design and Evaluation
48
Gutiérrez F., Cardoso B., Verbert K. (2017). PHARA: a personal health augmented reality assistant to
support decision-making at grocery stores. In: Proceedings of the International Workshop on Health
Recommender Systems co-located with ACM RecSys 2017 (Paper No. 4) (10-13).
51. 51
Gutiérrez Hernández, F. S., Htun, N. N., Vanden Abeele, V., De Croon, R., & Verbert, K. (2021).
Explaining call recommendations in nursing homes: a user-centered design approach for interacting
with knowledge-based health decision support systems. In Proceedings of the 27th Annual
Conference on Intelligent User Interfaces. ACM.
52. Evaluation
¤ 12 nurses used the app for three months
¤ Data collection
¤ Interaction logs
¤ Resque questions
¤ Semi-structured interviews
52
54. Results
¤ Iterative design process identified several important features, such as the pending
list, overview and the feedback shortcut to encourage feedback.
¤ Explanations seem to contribute well to better support the healthcare
professionals.
¤ Results indicate a better understanding of the call notifications by being able to
see the reasons of the calls.
¤ More trust in the recommendations and increased perceptions of transparency
and control
¤ Interaction patterns indicate that users engaged well with the interface, although
some users did not use all features to interact with the system.
¤ Need for further simplification and personalization.
54
56. 56
Explaining health recommendations
Word cloud Feature importance Feature importance+ %
Maxwell Szymanski, Vero Vanden Abeele and Katrien Verbert Explaining
health recommendations to lay users: The dos and don’ts – Apex-IUI 2022
60. Take-away messages
¤ Involvement of end-users has been key to come up with
interfaces tailored to the needs of non-expert users
¤ Actionable vs non-actionable parameters
¤ Domain expertise of users and need for cognition
important personal characteristics
¤ Need for personalisation and simplification
60
61. Peter Brusliovsky Nava Tintarev Cristina Conati
Denis Parra
Collaborations
Bart Knijnenburg Jurgen Ziegler