Evaluating RS Fairness via Generalized Cross Entropy

Recommender Systems Fairness Evaluation
via Generalized Cross Entropy
Recommender Systems Fairness
Evaluation via Generalized Cross
Entropy
Yashar Deldjoo(1), Vito Walter Anelli(1), Hamed Zamani(2), Alejandro Bellogin(3),
Tommaso Di Noia(1)
1. Polytechnic University of Bari, Italy
2. University of Massachussetts, Amherst, USA
3. Universidad Autonoma de Madrid, Spain

Background - Roots of the topic
● Our concern: algorithmic fairness in decision-making systems

Background - Fairness in AI
● AI is involved in many
life-affecting decision points,
for example:
○ Criminal risk prediction
○ credit risk assessments
○ housing allocation
○ loan qualiﬁcation prediction
○ hiring decision making
Picture taken with courtesy from: https://www.zdnet.com/article/inside-the-black-box-understanding-ai-decision-making/

Background - Fairness in RS
● In RS community the concept is viewed as multi-sided aspect.
● Shares similarity with the concept of:
○ Reciprocal recommendation: View RS as systems fulﬁlling dual goals like a
transaction: (1) user-centered utility and vendor-centered utility
○ Multi-stakeholder setting: generalization of reciprocal recommendation; system
designed to meet benﬁrst of users, items and other parties involved.

What we agree on
Unfair recommendation could have far-reaching consequences, impacting
people’s lives and putting minority groups at a major disadvantage.

Common Fairness Interpretation
One common characteristic
fairness interpretation in previous
literature:
Fairness = Equality across
members of protected groups

Equality
Ekstrand et. al. studied whether RS produce equal utility for users of different
demographic groups.
● Found demographic differences in measured effectiveness across two datasets from different
domains
Yao et. al. studied various types of unfairness in CF model:
● Proposed to penalize algorithms producing disparate distributions of prediction error.
Ekstrand, et. al. "All the cool kids, how do they ﬁt in?: Popularity and demographic biases in recommender evaluation and effectiveness." ACM FAT*Conference
2018.
Yao, Sirui, and Bert Huang. "Beyond parity: Fairness objectives for collaborative ﬁltering." In Advances in Neural Information Processing Systems, 2017.

Research Questions/Goals
● Deﬁne a probabilistic framework
for evaluating RS fairness based
on attributes of any nature (e.g.,
sensitive or insensitive) for both
items or users
● Measure fairness in RS considering
fairness as equality or
non-equality among groups

Motivating Scenario

How to measure Fairness?
● group accuracy using ratings
difference (Zhu et al. 2018)
● group accuracy using nDCG difference
(Ekstrand et al. 2018)
● exposure via protected group precision
using ratio (Burke et al., 2018)
● item recommendation probability using
KL divergence (Yang & Stoyanovich,
2017)

Generalized Cross Entropy (GCE)
p: Performance Distribution
p_f: Fair Distribution
a: user or item attribute
ɑ : parameter emphasizing the difference
between distributions
● Hellinger distance for α = ½
● Pearson’s χ 2 discrepancy measure for
α = 2
● Neymann’s χ 2 measure for α = −1
● Kullback-Leibler distance in the limit as
α → 1
● Burg CE distance as α → 0
Botev et al. 2011. The generalized cross entropy method, with applications to probability density estimation. Methodology and Computing in Applied Probability

Performance Distribution p
● Estimated based on the
output of the recommender
system on a test set.
● We deﬁne a recommendation
gain for each user (rg_u) and
item (rg_i)
User gain
Item gain

Performance Distribution p
The proposed fairness evaluation
framework design to capture
ﬁtness
for both users and items ->
multi-stakeholder setting
User gain
Item gain

Fair Distribution p_f
● Introduced by the
system-designer
● Problem-speciﬁc and
determined based on the
problem and target scenario
Example with 2 groups:
● p_f0 = [½, ½]: Equal rec quality
between g1 and g2
● p_f1 = [2/3, 1/3]: Better rec quality to g1
than g2

Toy Example
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10
Orange: free users
Green: premium users (paid
membership)
Rec 0: recommends more relevant
for premium users (3 v.s. 6)
Rec 1: recommends 1 relevant for
each user
Rec 2: all recommended items @3
are relevant
a1 user 1 ✓ ✓ ✓
a1 user 2 ✓ ✓
a1 user 3 ✓ ✓
a2 user 4 ✓ ✓ ✓
a2 user 5 ✓ ✓
a2 user 6
✓ ✓ ✓ ✓

Toy Example
GCE(p_f, p, α=-1)
Pr
@3
Re
@3
p_f0 p_f1 p_f2
Rec 0 0.0800 0.3025 0.0025 1/2 0.530
Rec 1 0 0.0625 0.0625 1/3 0.375
Rec 2 0.0078 0.1182 0.0244 1 0.958
p_f0 = [½, ½], p_f1 =[2/3, 1/3], p_f2 = [1/3, 2/3]
● Rec 0: not completely fair if fairness
means equality between free and
premium (GCE = 0.08 ≠ 0)
● Rec 0: more fair if fairness means giving
better recommendation to premium/paid
user (GCE = 0.0025)
● Rec 1 and Rec2: even though Pr and Re@3
improves, it cannot produce fair results
irrespective of p_f
● Rec 2: GCE never reaches optimal value
Better recommendation quality ≠ More Fair
Due to Inherent Biases in the Data

Advantages of the Proposed Framework
● The proposed evaluation framework:
○ is flexible to model fairness based on the interest of system designer by defining
fair recommendation distribution p_f
○ Models fairness not necessarily as equality between members of groups.
■ In some application fairness = equality (e.g., gender, race)
■ in other applications scenarios, it may not (e.g., free v.s. premium users)
● It incorporates a gain factor in its design, which can be flexibly defined to
contemplate different accuracy-related metrics to measure fairness upon (Precision,
MAP, NDCG)

Advantages of the Proposed Framework
● Unlike most previous work that solely focused on either user fairness or item
fairness, the proposed framework integrates both user-related and item-related gain
factors.

Experimental evaluation
● Datasets:
○ Xing Job Recommendation Dataset
(Xing-REC 17)
○ Amazon Toys & Games
● Baseline algorithms:
○ Item-kNN
○ User-kNN
○ BPR-MF
○ BPR-Slim
○ SVD++
● Time-aware Splitting with ﬁxed
timestamp
● Characteristics of Amazon Dataset:
○ 53K Preference scores
○ 1K Users
○ 24K Items
○ 99.8% Sparsity
● Four user groups: VIA, SIA, SA, VA
● Five fair distributions considered:
p f 0 = [0.25,0.25,0.25,0.25]
p f 1 = [0.7,0.1,0.1,0.1]
p f 2 = [0.1,0.7,0.1,0.1]
p f 3 = [0.1,0.1,0.7,0.1]
p f 4 = [0.1,0.1,0.1,0.7]

Baseline Metric: Mean Absolute Difference (MAD)
Ziwei Zhu, Xia Hu, and James Caverlee. 2018. Fairness-Aware Tensor-Based
Recommendation. CIKM 2018

Experimental results

Summary
● We presented a probabilistic framework for evaluating RS fairness based on sensitive or
insensitive attributes for both items or users
● The proposed framework is ﬂexible enough to measure fairness in RS by considering fairness as
equality or non-equality among groups
Future work
● exploit GCE to build recommender systems that directly optimize for this objective criterion
● study fairness of recommendation under CB or CF models:
○ using item side information
○ on different domains

Evaluating RS Fairness via Generalized Cross Entropy

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (15)

Similar a Evaluating RS Fairness via Generalized Cross Entropy

Similar a Evaluating RS Fairness via Generalized Cross Entropy (20)

Último

Último (20)

Evaluating RS Fairness via Generalized Cross Entropy