Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality – i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness in recommender systems does not necessarily imply equality, but instead it should consider a distribution of resources based on merits and needs. We present a probabilistic framework based on generalized cross entropy to evaluate fairness of recommender systems under this perspective, where we show that the proposed framework is flexible and explanatory by allowing to incorporate domain knowledge (through an ideal fair distribution) that can help to understand which item or user aspects a recommendation algorithm is over- or under-representing. Results on two real-world datasets show the merits of the proposed evaluation framework both in terms of user and item fairness.
Evaluating RS Fairness via Generalized Cross Entropy
1. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Recommender Systems Fairness
Evaluation via Generalized Cross
Entropy
Yashar Deldjoo(1), Vito Walter Anelli(1), Hamed Zamani(2), Alejandro Bellogin(3),
Tommaso Di Noia(1)
1. Polytechnic University of Bari, Italy
2. University of Massachussetts, Amherst, USA
3. Universidad Autonoma de Madrid, Spain
2. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Background - Roots of the topic
● Our concern: algorithmic fairness in decision-making systems
3. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Background - Fairness in AI
● AI is involved in many
life-affecting decision points,
for example:
○ Criminal risk prediction
○ credit risk assessments
○ housing allocation
○ loan qualification prediction
○ hiring decision making
Picture taken with courtesy from: https://www.zdnet.com/article/inside-the-black-box-understanding-ai-decision-making/
4. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Background - Fairness in RS
● In RS community the concept is viewed as multi-sided aspect.
● Shares similarity with the concept of:
○ Reciprocal recommendation: View RS as systems fulfilling dual goals like a
transaction: (1) user-centered utility and vendor-centered utility
○ Multi-stakeholder setting: generalization of reciprocal recommendation; system
designed to meet benfirst of users, items and other parties involved.
5. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
What we agree on
Unfair recommendation could have far-reaching consequences, impacting
people’s lives and putting minority groups at a major disadvantage.
6. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Common Fairness Interpretation
One common characteristic
fairness interpretation in previous
literature:
Fairness = Equality across
members of protected groups
7. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Equality
Ekstrand et. al. studied whether RS produce equal utility for users of different
demographic groups.
● Found demographic differences in measured effectiveness across two datasets from different
domains
Yao et. al. studied various types of unfairness in CF model:
● Proposed to penalize algorithms producing disparate distributions of prediction error.
Ekstrand, et. al. "All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and effectiveness." ACM FAT*Conference
2018.
Yao, Sirui, and Bert Huang. "Beyond parity: Fairness objectives for collaborative filtering." In Advances in Neural Information Processing Systems, 2017.
8. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Research Questions/Goals
● Define a probabilistic framework
for evaluating RS fairness based
on attributes of any nature (e.g.,
sensitive or insensitive) for both
items or users
● Measure fairness in RS considering
fairness as equality or
non-equality among groups
10. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
How to measure Fairness?
● group accuracy using ratings
difference (Zhu et al. 2018)
● group accuracy using nDCG difference
(Ekstrand et al. 2018)
● exposure via protected group precision
using ratio (Burke et al., 2018)
● item recommendation probability using
KL divergence (Yang & Stoyanovich,
2017)
11. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Generalized Cross Entropy (GCE)
p: Performance Distribution
p_f: Fair Distribution
a: user or item attribute
ɑ : parameter emphasizing the difference
between distributions
● Hellinger distance for α = ½
● Pearson’s χ 2 discrepancy measure for
α = 2
● Neymann’s χ 2 measure for α = −1
● Kullback-Leibler distance in the limit as
α → 1
● Burg CE distance as α → 0
Botev et al. 2011. The generalized cross entropy method, with applications to probability density estimation. Methodology and Computing in Applied Probability
12. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Performance Distribution p
● Estimated based on the
output of the recommender
system on a test set.
● We define a recommendation
gain for each user (rg_u) and
item (rg_i)
User gain
Item gain
13. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Performance Distribution p
The proposed fairness evaluation
framework design to capture
fitness
for both users and items ->
multi-stakeholder setting
User gain
Item gain
14. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Fair Distribution p_f
● Introduced by the
system-designer
● Problem-specific and
determined based on the
problem and target scenario
Example with 2 groups:
● p_f0 = [½, ½]: Equal rec quality
between g1 and g2
● p_f1 = [2/3, 1/3]: Better rec quality to g1
than g2
15. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Toy Example
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10
Orange: free users
Green: premium users (paid
membership)
Rec 0: recommends more relevant
for premium users (3 v.s. 6)
Rec 1: recommends 1 relevant for
each user
Rec 2: all recommended items @3
are relevant
a1 user 1 ✓ ✓ ✓
a1 user 2 ✓ ✓
a1 user 3 ✓ ✓
a2 user 4 ✓ ✓ ✓
a2 user 5 ✓ ✓
a2 user 6
✓ ✓ ✓ ✓
16. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Toy Example
GCE(p_f, p, α=-1)
Pr
@3
Re
@3
p_f0 p_f1 p_f2
Rec 0 0.0800 0.3025 0.0025 1/2 0.530
Rec 1 0 0.0625 0.0625 1/3 0.375
Rec 2 0.0078 0.1182 0.0244 1 0.958
p_f0 = [½, ½], p_f1 =[2/3, 1/3], p_f2 = [1/3, 2/3]
● Rec 0: not completely fair if fairness
means equality between free and
premium (GCE = 0.08 ≠ 0)
● Rec 0: more fair if fairness means giving
better recommendation to premium/paid
user (GCE = 0.0025)
● Rec 1 and Rec2: even though Pr and Re@3
improves, it cannot produce fair results
irrespective of p_f
● Rec 2: GCE never reaches optimal value
Better recommendation quality ≠ More Fair
Due to Inherent Biases in the Data
17. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Advantages of the Proposed Framework
● The proposed evaluation framework:
○ is flexible to model fairness based on the interest of system designer by defining
fair recommendation distribution p_f
○ Models fairness not necessarily as equality between members of groups.
■ In some application fairness = equality (e.g., gender, race)
■ in other applications scenarios, it may not (e.g., free v.s. premium users)
● It incorporates a gain factor in its design, which can be flexibly defined to
contemplate different accuracy-related metrics to measure fairness upon (Precision,
MAP, NDCG)
18. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Advantages of the Proposed Framework
● Unlike most previous work that solely focused on either user fairness or item
fairness, the proposed framework integrates both user-related and item-related gain
factors.
19. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Experimental evaluation
● Datasets:
○ Xing Job Recommendation Dataset
(Xing-REC 17)
○ Amazon Toys & Games
● Baseline algorithms:
○ Item-kNN
○ User-kNN
○ BPR-MF
○ BPR-Slim
○ SVD++
● Time-aware Splitting with fixed
timestamp
● Characteristics of Amazon Dataset:
○ 53K Preference scores
○ 1K Users
○ 24K Items
○ 99.8% Sparsity
● Four user groups: VIA, SIA, SA, VA
● Five fair distributions considered:
p f 0 = [0.25,0.25,0.25,0.25]
p f 1 = [0.7,0.1,0.1,0.1]
p f 2 = [0.1,0.7,0.1,0.1]
p f 3 = [0.1,0.1,0.7,0.1]
p f 4 = [0.1,0.1,0.1,0.7]
20. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Baseline Metric: Mean Absolute Difference (MAD)
Ziwei Zhu, Xia Hu, and James Caverlee. 2018. Fairness-Aware Tensor-Based
Recommendation. CIKM 2018
22. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Summary
● We presented a probabilistic framework for evaluating RS fairness based on sensitive or
insensitive attributes for both items or users
● The proposed framework is flexible enough to measure fairness in RS by considering fairness as
equality or non-equality among groups
Future work
● exploit GCE to build recommender systems that directly optimize for this objective criterion
● study fairness of recommendation under CB or CF models:
○ using item side information
○ on different domains
23. Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Recommender Systems Fairness
Evaluation via Generalized Cross
Entropy
Yashar Deldjoo(1), Vito Walter Anelli(1), Hamed Zamani(2), Alejandro Bellogin(3),
Tommaso Di Noia(1)
1. Polytechnic University of Bari, Italy
2. University of Massachussetts, Amherst, USA
3. Universidad Autonoma de Madrid, Spain