My presentation at ACM Conference on Intelligent User Interfaces (IUI 2019) "The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Systems of Artistic Images"
The Effect of Explanations & Algorithmic Accuracy on Visual Recommender Systems of Artistic Images - ACM IUI 2019
1. The Effect of Explanations and Algorithmic Accuracy
on
Visual Recommender Systems of Artistic Images
Pontificia UniversidadCatólica de Chile (PUC Chile)
Vicente
Domínguez
Pablo
Messina
Ivania
Donoso-Guzmán
Denis
Parra
2. The Effect of Explanations and Algorithmic Accuracy
on
Visual Recommender Systems of Artistic Images
(candidate to best paper)
Pontificia UniversidadCatólica de Chile (PUC Chile)
Vicente
Domínguez
Pablo
Messina
Ivania
Donoso-Guzmán
Denis
Parra
3. The Online Artwork Market
• Online artwork market: Growing since 2008,
despite global crises!
• In 2011, art received $11.57 billion in total global
annual revenue, over $2 billion versus 2010
(*forbes)
• Recommender systems could play an important
role, just like in other e-commerce industries.
[forbes] https://www.forbes.com/sites/abigailesman/2012/02/29/ the- worlds- strongest- economy- the- global- art- market/ (2012)
March 17th 2019 Dominguez et al ~ ACM IUI 2019 3
4. Related Work: Art Recommendation
March 17th 2019 Dominguez et al ~ ACM IUI 2019 4
• Previous art recommendationprojects date for as long as 2007, such
as the CHIP project to recommend paintings from Rijksmuseum.
(Aroyo et al., 2007)
• Some approaches have used:
– Ratings (Aroyo et al., 2007),
– Tags, Keywords or other textual features (Semeraro et al. 2012)
– Traditional visual features (van den Broek et al., 2006)
• Deep Learning (DL) have revolutionized visual feature learning, but few
works use them for recommending art (Het et al. 2016,Messina et al.
2018)
5. Related Work: Art Recommendation
March 17th 2019 Dominguez et al ~ ACM IUI 2019 5
• Previous art recommendationprojects date for as long as 2007, such
as the CHIP project to recommend paintings from Rijksmuseum.
(Aroyo et al., 2007)
• Some approaches have used:
– Ratings (Aroyo et al., 2007),
– Tags, Keywords or other textual features (Semeraro et al. 2012)
– Traditional visual features (van den Broek et al., 2006)
• Deep Learning (DL) have revolutionized visual feature learning, but few
works use them for recommending art (Het et al. 2016,Messina et al.
2018)
6. Related Work: Art Recommendation
• Previous art recommendationprojects date for as long as 2007, such
as the CHIP project to recommend paintings from Rijksmuseum.
(Aroyo et al., 2007)
• Some approaches have used:
– Ratings (Aroyo et al., 2007),
– Tags, Keywords or other textual features (Semeraro et al. 2012)
– Traditional visual features (van den Broek et al., 2006)
• Deep Learning (DL) have revolutionized computer vision, but few works
use DL methods for recommending art (He et al. 2016,Messina et al.
2018)
March 17th 2019 Dominguez et al ~ ACM IUI 2019 6
7. Our Approach to Image Recommendation
• Since 2017 we have been working on recommending art
images, using mostly content-based methods.
• Four papers published:
– ACM DLRS 2017: Dominguez,V.,Messina,P., Parra, D., Mery, D., Trattner,C., & Soto,A.
(2017,August).Comparing Neural and Attractivenessbased Visual Features forArtwork
Recommendation.In Proceedingsof the2nd Workshop on Deep Learning for
RecommenderSystems(pp.55-59).ACM.
– UMUAI 2018: Messina,P., Dominguez,V.,Parra, D., Trattner,C., & Soto, A. (2018).Content-
based artwork recommendation:integrating paintingmetadatawith neural and manually-
engineered visual features.UserModeling and User-Adapted Interaction,1-40.
– Two additional workshop papers in ACM RecSys 2018.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 7
8. Open Questions from Content-Based RecSys
• We learned that visual features from DNNs perform better than
attractiveness visual features.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 8
Average brightness
Saturation
Sharpness
Entropy
RGB-contrast
Colorfulness
Naturalness
Predictive Accuracy
Attractiveness
visual features
Deep Learning
visual features
9. Open Questions from Content-Based RecSys
• We learned that visual features from DNNs perform better than
attractiveness visual features, but they are harder to explain.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 9
Predictive Accuracy
Explainability
Average brightness
Saturation
Sharpness
Entropy
RGB-contrast
Colorfulness
Naturalness
Attractiveness
visual features
Deep Learning
visual features
10. Explainability in Recommender Systems
• Explaining recommendations is a well-established and
active area of research, but there is little research on
explaining image recommendations.
March 17th 2019 Dominguez et al ~ ACM IUI 2019
Herlocker, J. L., Konstan, J. A., & Riedl, J. (2000).
Explaining collaborative filtering
recommendations. In Proceedings of the 2000
ACM conference on Computer supported
cooperative work (pp. 241-250)
10
Tintarev, N., & Masthoff, J. (2015). Explaining
recommendations: Design and evaluation.
In Recommender systems handbook (pp. 353-
382). Springer, Boston, MA.
11. Gaining Inspiration from DARPA XAI
March 17th 2019 Dominguez et al ~ ACM IUI 2019 11
DARPA XAI (Gunning, 2017)
Decision Tree
DNN
12. DARPA XAI inspiration
March 17th 2019 Dominguez et al ~ ACM IUI 2019 12
DARPA XAI (Gunning, 2017)
Decision Tree
DNN
Our research
AVF
DNN VF
Explainability Potential
PredictiveAccuracy
Is there an effect on
user perception if
we provide a UI with
explanations?
13. Our General Research Question
• Explainable &
Transparent UIs are well
perceived by users in
previous RecSys
research, but: Does it
matter to UX if you do
not produce accurate
enough predictions?
March 17th 2019 Dominguez et al ~ ACM IUI 2019 13
AVF
DNN VF
Explainability Potential
PredictiveAccuracy
- UI w/explanation
based on examples
- UI w/explanation
based on examples
or visual features
14. Research Questions
• RQ1. Given three different types of interfaces, one baseline interface without
explanations and two with explanations but different levels of transparency,
which one is perceived as most useful?
• RQ2. Furthermore, based on the visual content-based recommenderalgorithm
chosen (DNN or AVF), are there observable differences in how the three
interfaces are perceived?
• RQ3. How do independent variables such as algorithm, explainable interface
and domain knowledge interact in order to explain the user experience with the
recommender system in terms of perception of relevance, diversity,
explainabilityand user satisfaction?
March 17th 2019 Dominguez et al ~ ACM IUI 2019 14
15. Research Questions
• RQ1. Given three different types of interfaces, one baseline interface without
explanations and two with explanations but different levels of transparency,
which one is perceived as most useful?
• RQ2. Furthermore, based on the visual content-based recommenderalgorithm
chosen (DNN or AVF), are there observable differences in how the three
interfaces are perceived?
• RQ3. How do independent variables such as algorithm, explainable interface
and domain knowledge interact in order to explain the user experience with the
recommender system in terms of perception of relevance, diversity,
explainabilityand user satisfaction?
March 17th 2019 Dominguez et al ~ ACM IUI 2019 15
16. Research Questions
• RQ1. Given three different types of interfaces, one baseline interface without
explanations and two with explanations but different levels of transparency,
which one is perceived as most useful?
• RQ2. Furthermore, based on the visual content-based recommenderalgorithm
chosen (DNN or AVF), are there observable differences in how the three
interfaces are perceived?
• RQ3. How do independent variables such as algorithm, explainable interface
and domain knowledge interact in order to explain the user experience with the
recommender system in terms of perception of relevance, diversity,
explainabilityand user satisfaction?
March 17th 2019 Dominguez et al ~ ACM IUI 2019 16
17. Materials and Methodology
1. We collect image data from UGallery, an online store of
physical artworks (paintings and photographs).
2. We implement a system which records user preferences and
generates recommendations. Some of these
recommendations will have explanations.
3. We conduct a user study in Amazon Mechanical Turk to
assess the impact of algorithmic accuracy and explainable UI
on several user dimensions.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 17
18. Data: UGallery
• Online Artwork
Store, based on CA,
USA.
• Mostly sales one-
of-a-kind physical
artwork.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 18
19. CB RecSys algorithm: Visual Features
• (DNN) Deep Neural
Networks
• (AVF) Attractiveness-
based
March 17th 2019 Dominguez et al ~ ACM IUI 2019 19
20. Visual Features: DNN vs. AVF
• Deep Neural Networks (DNN):
we used an AlexNet DNN
(pre-trained with ImageNet
ILSVRC 2012 dataset)
(Krizhevsky et al, 2012)
• We map each artwork image
to its corresponding latent
vector of 4,096 dimensions
obtained at the fc6 layer of
the AlexNet network.
21. Visual Features: AVF
• Average brightness,
• Saturation,
• Sharpness,
• Entropy,
• RGB-contrast,
• Colorfulness,
• Naturalness
Jose San Pedro and Stefan Siersdorfer. 2009.
Ranking andClassifying Attractiveness of Photos
in Folksonomies.
In Proceedings of the 18thInternational
Conference on World Wide Web (WWW’09).
March 17th 2019 Dominguez et al ~ ACM IUI 2019 21
23. Generating Explanations
March 17th 2019 Dominguez et al ~ ACM IUI 2019 23
Friedrich, G., & Zanker, M. (2011). A
taxonomy for generating
explanations in recommender
systems. AI Magazine, 32(3), 90-98.
37. Evaluation & Results
• Post-algorithm survey aspects (agreement1-100)
– Explainable:I understood why the art images were recommended to
me.
– Relevance:The art images recommended matched my interests.
– Diverse: The art images recommended were diverse.
– Interface Satisfaction: Overall, I am satisfied with the recommender
interface.
– Use Again: I would use this recommender system again for finding
art images in the future.
– Trust: I trusted the recommendations made.
• Finally, NASA TLX (cognitive effort)
March 17th 2019 Dominguez et al ~ ACM IUI 2019 37
39. Evaluation & Results
Study on Amazon Mechanical Turk:
● 121 valid users completed correctly the study.
● Task took them around 10 minutes to complete.
● ~56% female, 44% male.
● 80% attended to 1 or more art classes at high school level or
above.
● 80% visited museums or art galleries at least once a year.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 39
40. Results
March 17th 2019 Dominguez et al ~ ACM IUI 2019 40
Interface 1: UI without explanation
Interface 2: UI with example-based explanation
Interface 3: UI with transparent explanation (AVF)
41. Results
March 17th 2019 Dominguez et al ~ ACM IUI 2019 41
7 dimensions evaluated, for DNN and AVF (scale 1-100):
Perception of:
- Explainability
- Relevance
- Diversity
- Satisfaction w/UI
- Intention of use
- Trust on RecSys
- Avg. Rating
42. Results
• Result 1: DNN better or equal than AVF except on diversity
March 17th 2019 Dominguez et al ~ ACM IUI 2019 42
- AVF is perceived significantly more
diverse than DNN excepting when AVF
uses transparent and explainable
recommendations.
43. Results
• Result 2: Explainable interfaces increase perception of
explainability.
March 17th 2019 Dominguez et al ~ ACM IUI 2019
- Result expected: people perceive the system as
more explainable using the explainable
interfaces than non explainable.
43
44. Results
• Result 2: Explainable interfaces increase perception of
explainability.
March 17th 2019 Dominguez et al ~ ACM IUI 2019
- Result expected: people perceive the system as
more explainable using the explainable
interfaces than non explainable.
- But this perception is significantly higher using
DNN visual features than AVF.
44
45. Evaluation & Results
• Result 2: Perception of relevance changes just by adding
explanations -> User Interface really matters!!
March 17th 2019 Dominguez et al ~ ACM IUI 2019
- Algorithm is the same (DNN), but by
adding explanations people perceive
recommendations as more relevant,
- Result is significant only with DNN.
45
46. Evaluation & Results
• Result 3: No difference in Trust between DNN and AVF in I1
(without explanations)
March 17th 2019 Dominguez et al ~ ACM IUI 2019 46
- The difference in Trust between DNN
and AVF becomes significant only
when using explainable interfaces.
47. Evaluation & Results
• Result 4: Without explanations, No difference between DNN
and AVF upon intention of use and satisfaction with interface.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 47
- The difference on Satisfaction and
Intention of use between DNN and
AVF becomes significant only when
using explainable interfaces.
48. General Model: Knijnenburg Framework
March 17th 2019 Dominguez et al ~ ACM IUI 2019 48
Knijnenburg,B. P., & Willemsen,M. C. (2015).Evaluatingrecommender
systemswith user experiments.In RecommenderSystems
Handbook (pp.309-352).Springer,Boston,MA.
49. Visual Art RecSys Explanation Model
March 17th 2019 Dominguez et al ~ ACM IUI 2019 49
50. Visual Art RecSys Explanation Model
March 17th 2019 Dominguez et al ~ ACM IUI 2019 50
- Compared to the baseline interface
I1, the interface I2 (β=0.551) and
interface I3 (β=0.556) have a positive
effect on Understandability.
- The model explains 23.3% of the
variance of Understandability.
51. Visual Art RecSys Explanation Model
March 17th 2019 Dominguez et al ~ ACM IUI 2019 51
- Compared to the
baseline AVF, DNN
features have an strong
positive effect on
understandability
(β=0.710), as well as on
ratings (β=0.710).
52. Visual Art RecSys Explanation Model
March 17th 2019 Dominguez et al ~ ACM IUI 2019 52
- Perceived Effort (Insecurity, rush,
mental demand) has a significant
negative effect (β=-0.283) on
understandability.
53. Visual Art RecSys Explanation Model
March 17th 2019 Dominguez et al ~ ACM IUI 2019 53
- Satisfaction is strongly
affected by the Trust on
the system (+), as well
as by Understandability
(+), Time (+) and
Experience with the
domain (+).
- Trust is affected by
Understandability (+)
and by effort (-)
54. Conclusion
• DNN features performed better than AVF features, probably because
AlexNet is able to capture more complex patterns. Results confirm
previous work by Messina et al. (2018).
• Providing a UI with explanations has a big effect on several
dimensions evaluated. This study provides further evidence upon the
importance of user interfaces and not only algorithms as important
effects on UX with RecSys.
• Results indicate that explainability has a positive effect on UX with a
RecSys, but is not always well perceived with additional transparency.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 54
55. Future Work
• (limitation) If additional transparency is provided in an
explanation, make sure the explanation is actually being
understood.
• Develop a full end-to-end Deep Learning art recommendation
model, rather than using pre-trained embeddings and memory-
based content-based recommendation.
• Test recent advances in XAI and visualization of CNNs into the
recommendation framework.
March 17th 2019 Dominguez et al ~ ACM IUI 2019 55
56. Acknowledgment
• The were funded by PUC Chile, Conicyt, Fondecyt grant
11150783, as well as by the Millennium Institute for
Foundational Research on Data (IMFD).
March 17th 2019 Dominguez et al ~ ACM IUI 2019 56
58. References
● Vicente Dominguez, Pablo Messina, Denis Parra, Domingo Mery, Christoph Trattner, and
Alvaro Soto. 2017. Comparing Neural and Attractiveness-based Visual Features for
Artwork Recommendation. In Proceedings of the Workshop on Deep Learning for
Recommender Systems, co-located at RecSys 2017.
● Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley. 2016. Vista: A Visually,
Socially, and Temporally-aware Model for Artistic Recommendation. In Proceedings of
the 10th ACM Conference on Recommender Systems (RecSys ’16).
● Alex Krizhevsky, Ilya Sutskever, and Geo rey E Hinton. 2012. Imagenet classi cation with
deep convolutional neural networks. In Advances in neural information processing
systems.
● Pablo Messina, Vicente Dominguez, , Denis Parra, Christoph Trattner, and Alvaro Soto.
2018. Content-Based Artwork Recommendation: Integrating Painting Metadata with
Neural and Manually-Engineered Visual Features. User Modeling and User-Adapted
Interaction (2018).
● Jose San Pedro and Stefan Siersdorfer. 2009. Ranking and Classifying Attractiveness of
Photos in Folksonomies. In Proceedings of the 18th International Conference on World
Wide Web (WWW ’09).
March 17th 2019 Dominguez et al ~ ACM IUI 2019 58