Uncoupled Regression from Pairwise Comparison Data

Uncoupled Regression from
Comparison Data
Liyuan Xu
Gatsby Unit@UCL, Former AIP member
(Twitter: @ly9988)

Disclaimer
This talk is mainly based on our paper in NeurIPS2019

Regression Problem
(x1, y1), (x2, y2), …
(Coupled) Data
∼ PXY
f(X) ≃ 𝔼[Y|X]
Learn
Correspondence in data is assumed

Uncoupled Regression Problem
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Regression without data correspondence

Uncoupled Regression
Uncoupled regression is impossible itself.
→What is a practically feasible assumption?

Application of Uncoupled Regression
• Merging two datasets [Carpentier+, 2016]
• : income, housing priceX Y :
Government
Publish X
Bank
Publish Y
How to merge two datasets
collected independently?

• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
(Xi, Yi)
Security Incident

• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
Xi Yi
Anonymized Data

Data Fusion / Matching
Uncoupled Data w. Context
∼ PXZ
(x1, z1), (x2, z2), …
∼ PYZ
(y1, z′1), (y2, z′2), …
f(X) ≃ 𝔼[Y|X]
Learn
Use contextual data to merge two distributions
→ Data Fusion / Matching
Z

Isometric Uncoupled Regression [Carpentier+, 2016]
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Assuming
𝔼[Y|X] : monotonic
Monotonicity makes uncoupled regression feasible

Isometric Uncoupled Regression [Carpentier+, 2016]
• Advantage
• Consistency is proved [Rigollet et al. 2018]
→ Optimal model can be learn as data increases
• Limitation
• Monotonicity assumption may be too strong
• Is really income monotonic to housing price ?
• Only applicable to the case
• Need to know the noise distribution
• Solve problem with with known
X Y
X ∈ ℝ
Y = f*(x) + ε P(ε)

High-level concept
Message in [Carpentier+, 2016]
Uncoupled Data + Order Info. → Regression
Order info is provided by monotonic assumption
Our Idea
Order info is learned from pairwise comparison data
Uncoupled Data + Order Info. → Regression

Problem Setting
• Pairwise Comparison Data
• Originally considered in ranking context
• Sample two data points
• Obtain Pairwise Comparison Data as
(X, Y), (X′, Y′) ∼ PX,Y
(X+
, X−
)
{
X+
= X, X−
= X′ (if Y > Y′)
X+
= X′, X−
= X (if Y ≤ Y′)

Uncoupled Regression from Pairwise Comparison
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
∼ PX+,X−
(x+
1 , x−
1 ), (x+
2 , x−
2 ), …
Uncoupled Data Pairwise Comparison Data

Uncoupled Regression from Pairwise Comparison
Proposes two approaches:
Risk Approximation & Target Transformation
• Advantage
• Put no assumption on
• Need not to know noise distribution
• Limitation
• Not consistent
• Deviation from optimal model is bounded
• Empirically it works
𝔼[Y|X]

Formal Problem Settings
• Data Given:
• Unlabeled Data:
• Target Set:
• Pairwise Comparison Data:
• Goal: Find that satisﬁes
DX = {x1, x2, …, xn} ∼ PX
DY = {y1, y2, …, yn} ∼ PY
DX+,X− = {(x+
1 , x−
1 ), …, (x+
m, x−
m)} ∼ PX+,X−
f*
f* = arg min
f
R(f ), R(f ) = 𝔼[(f(X) − Y)2
]

Risk Approximation
Loss Decomposition
R(f ) = 𝔼X,Y[(f(X) − Y)2
]
= 𝔼X[f2
(X)] − 2𝔼X,Y[Yf(X)] + const .
Estimated from unlabeled data DX
Approx. by
linear combination of and
𝔼X,Y[Yf(X)]
𝔼X+[f(X+
)] 𝔼X−[f(X−
)]

Risk Approximation
Lemma 1 [Xu et al. 2019]
For any function ,f
𝔼X+[f(X+
)] = 2𝔼X,Y[FY(Y)f(X)]
𝔼X−[f(X−
)] = 2𝔼X,Y[(1 − FY(Y))f(X)],
where is CDF ofFY Y
If we can learn such thatw1, w2
Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
then,
𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+
)] + w2 𝔼X−[f(X−
)]

Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f

Risk Approximation
̂FY
̂w1, ̂w2
̂f
CDF is estimated viaFY

Risk Approximation
̂FY
̂w1, ̂w2
̂f
Weight is learned bŷw1, ̂w2
̂w1, ̂w2 = arg min
|DY|
∑
i=1
(yi − 2w1
̂FY(yi) − 2w2(1 − ̂FY(yi)))
2
Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))

Risk Approximation
̂FY
̂w1, ̂w2
̂f
Model is learned byf
̂f = arg min
f
1
|DX |
|DX|
∑
i=1
f(xi)2
−
2
|DX+,X− |
|DX+,X−|
∑
j=1
̂w1f(x+
j ) + ̂w2 f(x−
j )
𝔼X[f2
(X)] 2𝔼XY[Yf(X)]

Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Here, is the approximation errorErr(w1, w2)
Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2
]
→ Approximate loss well, small bias in the model

Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

In general,
① Theoretically, it’s inevitable…
② Empirically it works!
Err > 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

There exists two distributions
that cannot distinguished by PX, PY, PX+,X−

PXY
X
Y
˜PXY
X
Y
1/6
1/8 5/24
1/4
1/8 5/24
1/6
1/6
1/6
1/6
1/6
1/12
Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]

Empirical Result
• Learn a linear model in UCI datasets
• Uncoupled regression
• Use all features for , all targets for
• Note, no correspondence is given
• Generate 5000 pairs of
• Supervised regression
• Use entire coupled data
DX DY
DX+,X−
(X, Y)

Empirical Result
• MSE of linear models in UCI datasets
→ Can yield almost same MSE as supervised learning !

Conclusion So Far
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Introduced approach based on risk approximation
• Theoretical and empirical results are given
DX
DY
DX+,X−

Modeling CDF
from Pairwise Comparison Data

Theoretical Property (Recap)
Especially, if then
→ We can learn optimal
Y ∼ Unif[a, b] Err(b/2,a/2) = 0
Y
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)

Predicting Percentile
• Optimize Direct Marketing
• : Customer Feature, : Probability of Purchase
• Send discount tickets to 1% of potential customers
• CDF is more the target of interest than
• Predicting might not be a best idea…
• Due to class imbalance, all can be very small
X Y
FY(Y) Y
Y
Y

Predicting Percentile
• Sometimes percentile is the target of interest
• Learn that minimizes
• follows
→We can learn optimal from pairwise comparison
f(X)
R(f ) = 𝔼[(FY(Y) − f(X))2
]
FY(Y) Unif[0,1]
f

Motivating Example for Predicting Percentile
• Online Chess Rating
• : User attributes, : Abstract measure of “Skill”
• Skill is compared by game
• Pairwise comparison data given in nature
• Want to know the percentile in skill ranking
X Y

Simple Solution
• Problem (Recap)
• Given pairwise comparison data
• Predict conditional expectation of CDF
• Simple Solution
• Learn ranking model from
• Transform to
(X+
, X−
)
𝔼[FY(Y)|X]
r(X) (X+
, X−
)
r(X) 𝔼[FY(Y)|X]

Pairwise-Ranking based Approach
• Pairwise Learn to Rank
• Learn ranker which minimizes rank loss
• e.g. SVMRank, RankBoost
• Given test data and rank model,
r(X)
Xtest
𝔼[FY(Y)|X] ≃
Rank of Xtest in entire data
Number of entire data

Weakness in Pairwise-Ranking based Approach
• Original Goal is to minimize
,
• Rank model minimizes
Small does not necessary mean small
→We aim for directly minimizing
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
r(X)
Rr(r) R(f )
R(f )

Direct Minimization
Lemma 1 [Xu et al. 2019]
For any function ,h
𝔼X+[h(X+
)] = 2𝔼X,Y[FY(Y)h(X)]
𝔼X−[h(X−
)] = 2𝔼X,Y[(1 − FY(Y))h(X)]
From this lemma, we have
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
= 𝔼X[f2
(X)] −2𝔼X,Y[FY(Y)f(X)] +const .
= 𝔼X[f2
(X)] −𝔼X+[f(X+
)] +const .

R(f ) ≤ ̂R(f ) + Op
1
|DX |
+
1
|DX+,X− |
Empirical Approximation
• The original loss (without constant)
• The empirical loss
R(f )
R(f ) = 𝔼X[f2
(X)] − 𝔼X+[f(X+
)]
̂R(f )
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )

Target Transformation
• From previous discussion,
• We can learn optimal model for
• We can learn CDF function .
• Target Transformation Approach [Xu et al. 2019]
1. Learn function minimizes
2. Output regression model as
FY(Y)
FY
̂F
RF(F) = 𝔼X,Y[(FY(Y) − F(X))2
]
̂f
̂f = F(−1)
Y
(F(X))

• Target Transformation
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f

̂FY
̂F
̂f
CDF is estimated viaFY

̂FY
̂F
̂f
Model is learned bŷF
̂F = arg min
F
1
|DX |
|DX|
∑
i=1
F(xi)2
−
1
|DX+,X− |
|DX+,X−|
∑
j=1
F(x+
j )
𝔼X[f2
(X)] 2𝔼XY[FY(Y)f(X)]

̂FY
̂F
̂f
Model is learned byf
̂f = F−1
Y ( ̂F(X))

Experiment on UCI
• RA: Risk Approximation
• TT: Target Transformation
• SVMRank: TT approach with is learned based on SVMRank̂F

Conclusion
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Approach based on risk approximation
• Theoretical and empirical results are given
• Approach based on target transformation
• (Theoretical) and empirical results are given
DX
DY
DX+,X−

Thank you!
• Follow me on Twitter! (@ly9988)

Uncoupled Regression from Pairwise Comparison Data

Recomendados

Recomendados

Más contenido relacionado

Similar a Uncoupled Regression from Pairwise Comparison Data

Similar a Uncoupled Regression from Pairwise Comparison Data (20)

Último

Último (20)

Uncoupled Regression from Pairwise Comparison Data