SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wasserstein GAN
JIN HO LEE
2018-11-30
JIN HO LEE Wasserstein GAN 2018-11-30 1 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
• 1. Introduction
• 2 Different Distances
• 3 Wasserstein GAN
• 4 Empirical Results
▷ 4.1 Experimental Procedure
▷ 4.2 Meaningful loss metric
▷ 4.3 Improved stability
• 5 Related Work
JIN HO LEE Wasserstein GAN 2018-11-30 2 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1. Introduction
1. Introduction
• Main Goal : Learning GAN by using Wasserstein distance W(Pr, Pg)
• In Section 2, we provide how the Earth Mover (EM) distance behaves in
comparison to Total Variation (TV), Kullback-Leibler (KL) divergence and
Jensen-Shannon (JS) divergence.
• In Section 3, we define Wasserstein-GAN and efficient approximation of
the EM distance
• we empirically show that WGANs cure the main training problems of
GANs.
JIN HO LEE Wasserstein GAN 2018-11-30 3 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances
2. Different Distances
• A σ-algebra Σ of subst of X is a collection Σ of subsets of X satisfying
the following conditions
(a) ∅ ∈ Σ
(b) if B ∈ Σ then Bc ∈ Σ
(c) if B1, B2, · · · is a countable collection of sets in Σ then ∪∞
n=1Bn ∈ Σ
• Borel algebra : the smallest σ-algebra containing the open sets
• A probability space consists of sample space Ω, events F and
probability measure P where the set of events F is a σ-algebra
• A function µ is a probability measure on a probability space (X, Σ, P) if
(a) µ(X) = 1, µ(∅) = 0, µ(A) ∈ [0, 1] for every A ∈ Σ
(b) countable additivity : for all countable collections {Ei} of pairwise
disjoint sets:
µ (∪iEi) =
∑
i
µ(Ei).
JIN HO LEE Wasserstein GAN 2018-11-30 4 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances
• The Total Variation (TV) distance
δ(Pr, Pg) = sup
A∈Σ
|Pr(A) − Pg(A)|.
• The Kullback-Leibler (KL) divergence
KL(Pr||Pg) =
∫
log
(
Pr(x)
Pg(x)
)
Pr(x)dµ(x).
• The Jensen-Shannon (JS) divergence
JS(Pr, Pg) = KL(Pr||Pm) + KL(Pg||Pm),
where Pm = (Pr + Pg)/2 is the mixture.
• The Earth-Mover (EM) distance or Wasserstein-1
W(Pr, Pg) = inf
γ∈Π(Pr,Pg)
E(x,y)∼γ[||x − y||],
where Π(Pr, Pg) denotes the set of all joint distributions γ(x, y) whose
marginals are respectively Pr and Pg, that is γ is a coupling of Pr and Pg.
JIN HO LEE Wasserstein GAN 2018-11-30 5 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Couplings
Couplings
• χ : compact metric space
• Σ : the set of all Borel subset of χ
• Prob(χ) : probability measures on χ
Definition
Let µ and ν be probability measures on the same measurable space (S, Σ).
A coupling of µ and ν is a probability measure on the coupling product
space (S × S, Σ × Σ) such that the marginals of coincide with µ and ν, i.e.,
γ(A × S) = µ(A) and γ(S × A) = ν(A) ∀A ∈ Σ.
JIN HO LEE Wasserstein GAN 2018-11-30 6 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Couplings
Example
For 0 ≤ p1 ≤ p2 ≤ 1, qi = 1 − pi(i = 1, 2), we consider the following joint
distributions:
Since ˜X ∼ Ber(p1) and ˜Y ∼ Ber(p2), f and g are couplings of Ber(p1) and
Ber(p2).
JIN HO LEE Wasserstein GAN 2018-11-30 7 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example of Wasserstein Distance
Example
For previous joint distributions f and g, we assume that(it’s not true)
Π[Ber(p1), Ber(p2)] = {f, g}.
Then we have
W(Ber(p1), Ber(p2)) = min{q1p2 + p1q2, p2 − p1}.
Proof.
Since Π[Ber(p1), Ber(p2)] = {f, g}, we consider only two cases.
case 1. f ∈ Π[Ber(p1), Ber(p2)].
E(x,y)∼f[||x − y||]
= f(0, 0)||0 − 0|| + f(0, 1)||0 − 1|| + f(1, 0)||1 − 0|| + f(1, 1)||1 − 1||
= q1p2 + p1q2
JIN HO LEE Wasserstein GAN 2018-11-30 8 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example of Wasserstein Distance
case 2. g ∈ Π[Ber(p1), Ber(p2)].
E(x,y)∼g[||x − y||]
= g(0, 0)||0 − 0|| + g(0, 1)||0 − 1|| + g(1, 0)||1 − 0|| + g(1, 1)||1 − 1||
= p2 − p1
By case 1 and 2, we have
W(Ber(p1), Ber(p2)) = inf
γ∈Π[Ber(p1),Ber(p2)]
E(x,y)∼γ[||x − y||]
= inf
γ∈{f,g}
E(x,y)∼γ[||x − y||]
= min{q1p2 + p1q2, p2 − p1}.
JIN HO LEE Wasserstein GAN 2018-11-30 9 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances An example of coulpings
Lemma
For p1, p2 ∈ [0, 1], the set of all couplings Π[Ber(p1), Ber(p2)] of Ber(p1)
and Ber(p2) is {pa|a ∈ [0, 1]} where
pa(0, 0) = a
pa(0, 1) = q1 − a
pa(1, 0) = q2 − a
pa(1, 1) = p2 − q1 + a
Proof.
Let γ ∈ Π[Ber(p1), Ber(p2)]. Then we have the following table
γ Y = 0 Y = 1 Σyγ(x, y)
X = 0 q1
X = 1 q2
Σxγ(x, y) q2 p2
JIN HO LEE Wasserstein GAN 2018-11-30 10 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances An example of coulpings
For a ∈ [0, 1], if γ(0, 0) = a, then the following table is completely
determined.
γ Y = 0 Y = 1 Σyγ(x, y)
X = 0 a q1 − a q1
X = 1 q2 − a p2 − (q1 − a) q2
Σxγ(x, y) q2 p2
It means that, for a ∈ [0, 1], we can have a coupling γ of Ber(p1) and
Ber(p2) such that γ(0, 0) = a. This complete the proof.
JIN HO LEE Wasserstein GAN 2018-11-30 11 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances A computational result of Wasserstein Distance
Theorem
For p1 ≤ p2, we have
W(Ber(p1), Ber(p2)) = p2 − p1.
Proof.
From the previous Lemma, we have Π[Ber(p1), Ber(p2)] = {pa|a ∈ [0, 1]}
where pa(0, 0) = a. Then we obtain
E(x,y)∼pa
[||x − y||]
= pa(0, 0)||0 − 0|| + pa(0, 1)||0 − 1|| + pa(1, 0)||1 − 0|| + pa(1, 1)||1 − 1||
= 2 − p1 − p2 − 2a
Since p1 and p2 are constants and a is less or equal to marginal
probabilities, we have a ≤ min{q1, q2}. From the assumption p1 ≤ p2, we
have q1 ≥ q2 and min{q1, q2} = q2.
JIN HO LEE Wasserstein GAN 2018-11-30 12 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances A computational result of Wasserstein Distance
The function E(x,y)∼pa
[||x − y||] = 2 − p1 − p2 − 2a is linear by a and
a ≤ q2, we have
2 − p1 − p2 − 2a ≥ 2 − p1 − p2 − 2(1 − p2) = p2 − p1.
JIN HO LEE Wasserstein GAN 2018-11-30 13 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example 1
Example (1)
• We assume that
▷ Z ∼ U[0, 1] : uniform distribution on the unit interval.
▷ P0 : be the distribution of (0, Z) ∈ R2, uniform on a straight vertical
line passing through the origin.
▷ gθ(z) = (θ, z) with θ a single real parameter.
Then we obtain the following.
• W(P0, Pθ) = |θ|
• JS(P0, Pθ) =
{
log 2 if θ ̸= 0,
0 if θ = 0,
• KL(Pθ||P0) = KL(P0||Pθ) =
{
+∞ if θ ̸= 0,
0 if θ = 0,
JIN HO LEE Wasserstein GAN 2018-11-30 14 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example 1
• δ(P0, Pθ) =
{
1 if θ ̸= 0,
0 if θ = 0,
• When θt → 0, the sequence (Pθ)t∈N converges to P0 under the EM
distance, but does not convege at all under either us JS, KL, reverse KL,
or TV divergences.
JIN HO LEE Wasserstein GAN 2018-11-30 15 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 1
Theorem (1)
Let Pr be a fixed distribution over X. Let Z be a random variable (e.g
Gaussian) over another space Z. Let g : Z × Rd → χ be a function, that
will be denoted gθ(z) with z the first coordinate and θ the second. Let Pθ
denote the distribution of gθ(z). Then,
1. If g is continuous in θ, so is W(Pr, Pθ).
2. If g is locally Lipschitz and satisfies regularity assumption 1, then
W(Pr, Pθ) is continuous everywhere, and differentiable almost everywhere.
3. Statements 1-2 are false for the Jensen-Shannon divergence JS(Pr, Pθ)
and all the KLs.
JIN HO LEE Wasserstein GAN 2018-11-30 16 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 1
The following corollary tells us that learning by minimizing the EM
distance makes sense (at least in theory) with neural networks.
Corollary
Let gθ be any feedforward neural network parameterized by θ, and p(z) a
prior over z such that Ez∼p(z)[||z||] < ∞ (e.g. Gaussian, uniform, etc.).
Then assumption 1 is satisfied and therefore W(Pr, Pθ) is continuous
everywhere and differentiable almost everywhere.
JIN HO LEE Wasserstein GAN 2018-11-30 17 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 2
Theorem (2)
Let P be a distribution on a compact space X and (Pn)n∈N be a sequence
of distributions on X. Then, considering all limits as n → ∞,
1. The following statements are equivalent
• δ(Pn, P) → 0 with δ the total variation distance.
• JS(Pn, P) → 0 with JS the Jensen-Shannon divergence.
2. The following statements are equivalent
• W(Pn, P) → 0.
• Pn
D
−→ P where
D
−→ represents convergence in distribution for random
variables.
3. KL(Pn||P) → 0 or KL(P||n) → 0 imply the statements in (2)
JIN HO LEE Wasserstein GAN 2018-11-30 18 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN
3. Wasserstein GAN
• Computing W(Pr, Pg) is intractible from the definition of Wasserstein
distance. However, then Kantorovich-Rubinstein duality tell us that:
W(Pr, Pg) = sup
||f||L≤1
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
where ||f||L ≤ 1 means that f satisfies 1-Lipschitz condition.
• Note that, if we replace ||f||L ≤ 1 for ||f||L ≤ K for some K, we have
K · W(Pr, Pg) = sup
||f||L≤K
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)].
• If we have a parametrized family functions {fw}w∈W that are all
K-Lipschitz for some K, then we have:
max
w∈W
Ex∼Pr [fw(x)] − Ex∼Pθ
[fw(x)] ≤ sup
||f||L≤K
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
= K · W(Pr, Pθ)
JIN HO LEE Wasserstein GAN 2018-11-30 19 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Theorem 3
Theorem (3)
Let Pr be any distribution. Let Pθ be the distribution of gθ(Z) with Z a
random variable with density p and gθ a function satisfying assumption 1.
Then, there is a solution f : χ → R to the problem
max
||f||L≤1
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
and we have
∇θW(Pr, Pθ) = −Ez∼p(z)[∇θf(gθ(z))]
when both terms are well-defined.
• Objective functions:
LWGAN
D = Ex∼Pr [fw(x)] − Ez∼P(z)[fw(gθ(z))]
LWGAN
G = Ez∼P(z)[f(gθ(z))]
where wD ← clip(w, −0.01, 0.01) in LD.
JIN HO LEE Wasserstein GAN 2018-11-30 20 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Algorithm 1
JIN HO LEE Wasserstein GAN 2018-11-30 21 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Figure 2
In this paper, the Authors call discriminator critic. In Figure 2, we train a
GAN discriminator and a WGAN critic still optimality. The discriminator
learn very quickly to distinguish between fake and real. But, the critic
can’t saturate and converges to a linear function.
JIN HO LEE Wasserstein GAN 2018-11-30 22 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results
• We claim two main benefits:
▷ a meaningful loss metric that correlates with the generator’s
convergence and sample quality
▷ improved stability of the optimization process
JIN HO LEE Wasserstein GAN 2018-11-30 23 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results 4.1 Experimental Procedure
• Training curves and the visualization of samples at different stages of
training show clear correlation between the Wasserstein estimate and the
generated image quality.
JIN HO LEE Wasserstein GAN 2018-11-30 24 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results 4.1 Experimental Procedure
JIN HO LEE Wasserstein GAN 2018-11-30 25 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Some knowledge to read Appendix
• Let χ ∈ Rd be a compact set, that is closed and bounded by Heine-Borel
Theorem and Prob(χ) a probability measure over χ.
• We define
Cb(χ) = {f : χ → R|f is continuous and bounded}
• For f ∈ Cb(χ), we can define a norm ||f||∞ = max
x∈χ
|f(x)|, since f is
bounded.
• Then we have a normed vector space (Cb(χ), || · ||∞).
• The dual space
Cb(χ)∗
= {ϕ : Cb(χ) → R|ϕ is linear and continuous}
has norm ||ϕ|| = sup
f∈Cb(χ),||f||∞≤1
|ϕ(f)|.
JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Some knowledge to read Appendix
• Let µ be a signed measure over χ, and let the Total Variational distance
||µ||TV = sup
A⊂χ
|µ(A)|
where A is a Borel subset in χ. For two probability distributions Pr and
Pθ, we the function
δ(Pr, Pθ) = ||Pr − Pθ||TV
is a distance in Prob(χ) (called the Total Variation distance)
• We can consider
Φ : (Prob(χ), δ) → (Cb(χ)∗
, || · ||)
where Φ(P)(f) = Ex∼P[f(x)] is a linear function over Cb(χ).
• By the Riesz Representation Theorem, Φ is an isometric immersion, that
is δ(P, Q) = ||Φ(P) − Φ(Q)|| and ϕ is a 1-1 correspondence.
JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26

Más contenido relacionado

La actualidad más candente

3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24Alexander Decker
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsVjekoslavKovac1
 
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsRotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsSolo Hermelin
 
A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelTomonari Masada
 
Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...iosrjce
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)Rene Kotze
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015Tuna Yildirim
 
1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion Equation1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion EquationBob Marcus
 
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...VjekoslavKovac1
 
MATH3031_Project 130515
MATH3031_Project 130515MATH3031_Project 130515
MATH3031_Project 130515Matt Grifferty
 

La actualidad más candente (20)

3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24
 
ProjectAndersSchreiber
ProjectAndersSchreiberProjectAndersSchreiber
ProjectAndersSchreiber
 
M1l5
M1l5M1l5
M1l5
 
Chern-Simons Theory
Chern-Simons TheoryChern-Simons Theory
Chern-Simons Theory
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsRotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
 
A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax Model
 
Stability
StabilityStability
Stability
 
Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015
 
Spherical interval-valued fuzzy bi-ideals of gamma near-rings
Spherical interval-valued fuzzy bi-ideals of gamma near-ringsSpherical interval-valued fuzzy bi-ideals of gamma near-rings
Spherical interval-valued fuzzy bi-ideals of gamma near-rings
 
M1l6
M1l6M1l6
M1l6
 
pres06-main
pres06-mainpres06-main
pres06-main
 
1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion Equation1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion Equation
 
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
 
M1l3
M1l3M1l3
M1l3
 
MATH3031_Project 130515
MATH3031_Project 130515MATH3031_Project 130515
MATH3031_Project 130515
 

Similar a Wasserstein gan

Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GANJinho Lee
 
Common Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesCommon Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesIJLT EMAS
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theoremJamesMa54
 
Some Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya TheoremSome Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya Theoreminventionjournals
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansFrank Nielsen
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restrictionVjekoslavKovac1
 
Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...Krzysztof Pomorski
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform spacetheijes
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Alberto Maspero
 
Some Examples of Scaling Sets
Some Examples of Scaling SetsSome Examples of Scaling Sets
Some Examples of Scaling SetsVjekoslavKovac1
 
On the Zeros of Polar Derivatives
On the Zeros of Polar DerivativesOn the Zeros of Polar Derivatives
On the Zeros of Polar Derivativespaperpublications3
 
orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptxJaseSharma
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Alexander Decker
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 

Similar a Wasserstein gan (20)

Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Common Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesCommon Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform Spaces
 
lec12.ppt
lec12.pptlec12.ppt
lec12.ppt
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Some Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya TheoremSome Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya Theorem
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform space
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
 
Some Examples of Scaling Sets
Some Examples of Scaling SetsSome Examples of Scaling Sets
Some Examples of Scaling Sets
 
On the Zeros of Polar Derivatives
On the Zeros of Polar DerivativesOn the Zeros of Polar Derivatives
On the Zeros of Polar Derivatives
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptx
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
lec23.ppt
lec23.pptlec23.ppt
lec23.ppt
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
lec2.ppt
lec2.pptlec2.ppt
lec2.ppt
 

Más de Jinho Lee

Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Jinho Lee
 
Effective active learning strategy for multi label learning
Effective active learning strategy for multi label learningEffective active learning strategy for multi label learning
Effective active learning strategy for multi label learningJinho Lee
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial NetsJinho Lee
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesJinho Lee
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basicsJinho Lee
 
Ch.4 numerical computation
Ch.4  numerical computationCh.4  numerical computation
Ch.4 numerical computationJinho Lee
 
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersJinho Lee
 
Ch.3 Probability and Information Theory
Ch.3  Probability and Information TheoryCh.3  Probability and Information Theory
Ch.3 Probability and Information TheoryJinho Lee
 
Ch.2 Linear Algebra
Ch.2  Linear AlgebraCh.2  Linear Algebra
Ch.2 Linear AlgebraJinho Lee
 

Más de Jinho Lee (9)

Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3
 
Effective active learning strategy for multi label learning
Effective active learning strategy for multi label learningEffective active learning strategy for multi label learning
Effective active learning strategy for multi label learning
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
Ch.4 numerical computation
Ch.4  numerical computationCh.4  numerical computation
Ch.4 numerical computation
 
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-Encoders
 
Ch.3 Probability and Information Theory
Ch.3  Probability and Information TheoryCh.3  Probability and Information Theory
Ch.3 Probability and Information Theory
 
Ch.2 Linear Algebra
Ch.2  Linear AlgebraCh.2  Linear Algebra
Ch.2 Linear Algebra
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Wasserstein gan

  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents • 1. Introduction • 2 Different Distances • 3 Wasserstein GAN • 4 Empirical Results ▷ 4.1 Experimental Procedure ▷ 4.2 Meaningful loss metric ▷ 4.3 Improved stability • 5 Related Work JIN HO LEE Wasserstein GAN 2018-11-30 2 / 26
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction 1. Introduction • Main Goal : Learning GAN by using Wasserstein distance W(Pr, Pg) • In Section 2, we provide how the Earth Mover (EM) distance behaves in comparison to Total Variation (TV), Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence. • In Section 3, we define Wasserstein-GAN and efficient approximation of the EM distance • we empirically show that WGANs cure the main training problems of GANs. JIN HO LEE Wasserstein GAN 2018-11-30 3 / 26
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances 2. Different Distances • A σ-algebra Σ of subst of X is a collection Σ of subsets of X satisfying the following conditions (a) ∅ ∈ Σ (b) if B ∈ Σ then Bc ∈ Σ (c) if B1, B2, · · · is a countable collection of sets in Σ then ∪∞ n=1Bn ∈ Σ • Borel algebra : the smallest σ-algebra containing the open sets • A probability space consists of sample space Ω, events F and probability measure P where the set of events F is a σ-algebra • A function µ is a probability measure on a probability space (X, Σ, P) if (a) µ(X) = 1, µ(∅) = 0, µ(A) ∈ [0, 1] for every A ∈ Σ (b) countable additivity : for all countable collections {Ei} of pairwise disjoint sets: µ (∪iEi) = ∑ i µ(Ei). JIN HO LEE Wasserstein GAN 2018-11-30 4 / 26
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances • The Total Variation (TV) distance δ(Pr, Pg) = sup A∈Σ |Pr(A) − Pg(A)|. • The Kullback-Leibler (KL) divergence KL(Pr||Pg) = ∫ log ( Pr(x) Pg(x) ) Pr(x)dµ(x). • The Jensen-Shannon (JS) divergence JS(Pr, Pg) = KL(Pr||Pm) + KL(Pg||Pm), where Pm = (Pr + Pg)/2 is the mixture. • The Earth-Mover (EM) distance or Wasserstein-1 W(Pr, Pg) = inf γ∈Π(Pr,Pg) E(x,y)∼γ[||x − y||], where Π(Pr, Pg) denotes the set of all joint distributions γ(x, y) whose marginals are respectively Pr and Pg, that is γ is a coupling of Pr and Pg. JIN HO LEE Wasserstein GAN 2018-11-30 5 / 26
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Couplings Couplings • χ : compact metric space • Σ : the set of all Borel subset of χ • Prob(χ) : probability measures on χ Definition Let µ and ν be probability measures on the same measurable space (S, Σ). A coupling of µ and ν is a probability measure on the coupling product space (S × S, Σ × Σ) such that the marginals of coincide with µ and ν, i.e., γ(A × S) = µ(A) and γ(S × A) = ν(A) ∀A ∈ Σ. JIN HO LEE Wasserstein GAN 2018-11-30 6 / 26
  • 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Couplings Example For 0 ≤ p1 ≤ p2 ≤ 1, qi = 1 − pi(i = 1, 2), we consider the following joint distributions: Since ˜X ∼ Ber(p1) and ˜Y ∼ Ber(p2), f and g are couplings of Ber(p1) and Ber(p2). JIN HO LEE Wasserstein GAN 2018-11-30 7 / 26
  • 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example of Wasserstein Distance Example For previous joint distributions f and g, we assume that(it’s not true) Π[Ber(p1), Ber(p2)] = {f, g}. Then we have W(Ber(p1), Ber(p2)) = min{q1p2 + p1q2, p2 − p1}. Proof. Since Π[Ber(p1), Ber(p2)] = {f, g}, we consider only two cases. case 1. f ∈ Π[Ber(p1), Ber(p2)]. E(x,y)∼f[||x − y||] = f(0, 0)||0 − 0|| + f(0, 1)||0 − 1|| + f(1, 0)||1 − 0|| + f(1, 1)||1 − 1|| = q1p2 + p1q2 JIN HO LEE Wasserstein GAN 2018-11-30 8 / 26
  • 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example of Wasserstein Distance case 2. g ∈ Π[Ber(p1), Ber(p2)]. E(x,y)∼g[||x − y||] = g(0, 0)||0 − 0|| + g(0, 1)||0 − 1|| + g(1, 0)||1 − 0|| + g(1, 1)||1 − 1|| = p2 − p1 By case 1 and 2, we have W(Ber(p1), Ber(p2)) = inf γ∈Π[Ber(p1),Ber(p2)] E(x,y)∼γ[||x − y||] = inf γ∈{f,g} E(x,y)∼γ[||x − y||] = min{q1p2 + p1q2, p2 − p1}. JIN HO LEE Wasserstein GAN 2018-11-30 9 / 26
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances An example of coulpings Lemma For p1, p2 ∈ [0, 1], the set of all couplings Π[Ber(p1), Ber(p2)] of Ber(p1) and Ber(p2) is {pa|a ∈ [0, 1]} where pa(0, 0) = a pa(0, 1) = q1 − a pa(1, 0) = q2 − a pa(1, 1) = p2 − q1 + a Proof. Let γ ∈ Π[Ber(p1), Ber(p2)]. Then we have the following table γ Y = 0 Y = 1 Σyγ(x, y) X = 0 q1 X = 1 q2 Σxγ(x, y) q2 p2 JIN HO LEE Wasserstein GAN 2018-11-30 10 / 26
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances An example of coulpings For a ∈ [0, 1], if γ(0, 0) = a, then the following table is completely determined. γ Y = 0 Y = 1 Σyγ(x, y) X = 0 a q1 − a q1 X = 1 q2 − a p2 − (q1 − a) q2 Σxγ(x, y) q2 p2 It means that, for a ∈ [0, 1], we can have a coupling γ of Ber(p1) and Ber(p2) such that γ(0, 0) = a. This complete the proof. JIN HO LEE Wasserstein GAN 2018-11-30 11 / 26
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances A computational result of Wasserstein Distance Theorem For p1 ≤ p2, we have W(Ber(p1), Ber(p2)) = p2 − p1. Proof. From the previous Lemma, we have Π[Ber(p1), Ber(p2)] = {pa|a ∈ [0, 1]} where pa(0, 0) = a. Then we obtain E(x,y)∼pa [||x − y||] = pa(0, 0)||0 − 0|| + pa(0, 1)||0 − 1|| + pa(1, 0)||1 − 0|| + pa(1, 1)||1 − 1|| = 2 − p1 − p2 − 2a Since p1 and p2 are constants and a is less or equal to marginal probabilities, we have a ≤ min{q1, q2}. From the assumption p1 ≤ p2, we have q1 ≥ q2 and min{q1, q2} = q2. JIN HO LEE Wasserstein GAN 2018-11-30 12 / 26
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances A computational result of Wasserstein Distance The function E(x,y)∼pa [||x − y||] = 2 − p1 − p2 − 2a is linear by a and a ≤ q2, we have 2 − p1 − p2 − 2a ≥ 2 − p1 − p2 − 2(1 − p2) = p2 − p1. JIN HO LEE Wasserstein GAN 2018-11-30 13 / 26
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example 1 Example (1) • We assume that ▷ Z ∼ U[0, 1] : uniform distribution on the unit interval. ▷ P0 : be the distribution of (0, Z) ∈ R2, uniform on a straight vertical line passing through the origin. ▷ gθ(z) = (θ, z) with θ a single real parameter. Then we obtain the following. • W(P0, Pθ) = |θ| • JS(P0, Pθ) = { log 2 if θ ̸= 0, 0 if θ = 0, • KL(Pθ||P0) = KL(P0||Pθ) = { +∞ if θ ̸= 0, 0 if θ = 0, JIN HO LEE Wasserstein GAN 2018-11-30 14 / 26
  • 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example 1 • δ(P0, Pθ) = { 1 if θ ̸= 0, 0 if θ = 0, • When θt → 0, the sequence (Pθ)t∈N converges to P0 under the EM distance, but does not convege at all under either us JS, KL, reverse KL, or TV divergences. JIN HO LEE Wasserstein GAN 2018-11-30 15 / 26
  • 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 1 Theorem (1) Let Pr be a fixed distribution over X. Let Z be a random variable (e.g Gaussian) over another space Z. Let g : Z × Rd → χ be a function, that will be denoted gθ(z) with z the first coordinate and θ the second. Let Pθ denote the distribution of gθ(z). Then, 1. If g is continuous in θ, so is W(Pr, Pθ). 2. If g is locally Lipschitz and satisfies regularity assumption 1, then W(Pr, Pθ) is continuous everywhere, and differentiable almost everywhere. 3. Statements 1-2 are false for the Jensen-Shannon divergence JS(Pr, Pθ) and all the KLs. JIN HO LEE Wasserstein GAN 2018-11-30 16 / 26
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 1 The following corollary tells us that learning by minimizing the EM distance makes sense (at least in theory) with neural networks. Corollary Let gθ be any feedforward neural network parameterized by θ, and p(z) a prior over z such that Ez∼p(z)[||z||] < ∞ (e.g. Gaussian, uniform, etc.). Then assumption 1 is satisfied and therefore W(Pr, Pθ) is continuous everywhere and differentiable almost everywhere. JIN HO LEE Wasserstein GAN 2018-11-30 17 / 26
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 2 Theorem (2) Let P be a distribution on a compact space X and (Pn)n∈N be a sequence of distributions on X. Then, considering all limits as n → ∞, 1. The following statements are equivalent • δ(Pn, P) → 0 with δ the total variation distance. • JS(Pn, P) → 0 with JS the Jensen-Shannon divergence. 2. The following statements are equivalent • W(Pn, P) → 0. • Pn D −→ P where D −→ represents convergence in distribution for random variables. 3. KL(Pn||P) → 0 or KL(P||n) → 0 imply the statements in (2) JIN HO LEE Wasserstein GAN 2018-11-30 18 / 26
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN 3. Wasserstein GAN • Computing W(Pr, Pg) is intractible from the definition of Wasserstein distance. However, then Kantorovich-Rubinstein duality tell us that: W(Pr, Pg) = sup ||f||L≤1 Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] where ||f||L ≤ 1 means that f satisfies 1-Lipschitz condition. • Note that, if we replace ||f||L ≤ 1 for ||f||L ≤ K for some K, we have K · W(Pr, Pg) = sup ||f||L≤K Ex∼Pr [f(x)] − Ex∼Pθ [f(x)]. • If we have a parametrized family functions {fw}w∈W that are all K-Lipschitz for some K, then we have: max w∈W Ex∼Pr [fw(x)] − Ex∼Pθ [fw(x)] ≤ sup ||f||L≤K Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] = K · W(Pr, Pθ) JIN HO LEE Wasserstein GAN 2018-11-30 19 / 26
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN Theorem 3 Theorem (3) Let Pr be any distribution. Let Pθ be the distribution of gθ(Z) with Z a random variable with density p and gθ a function satisfying assumption 1. Then, there is a solution f : χ → R to the problem max ||f||L≤1 Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] and we have ∇θW(Pr, Pθ) = −Ez∼p(z)[∇θf(gθ(z))] when both terms are well-defined. • Objective functions: LWGAN D = Ex∼Pr [fw(x)] − Ez∼P(z)[fw(gθ(z))] LWGAN G = Ez∼P(z)[f(gθ(z))] where wD ← clip(w, −0.01, 0.01) in LD. JIN HO LEE Wasserstein GAN 2018-11-30 20 / 26
  • 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN Figure 2 In this paper, the Authors call discriminator critic. In Figure 2, we train a GAN discriminator and a WGAN critic still optimality. The discriminator learn very quickly to distinguish between fake and real. But, the critic can’t saturate and converges to a linear function. JIN HO LEE Wasserstein GAN 2018-11-30 22 / 26
  • 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results • We claim two main benefits: ▷ a meaningful loss metric that correlates with the generator’s convergence and sample quality ▷ improved stability of the optimization process JIN HO LEE Wasserstein GAN 2018-11-30 23 / 26
  • 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results 4.1 Experimental Procedure • Training curves and the visualization of samples at different stages of training show clear correlation between the Wasserstein estimate and the generated image quality. JIN HO LEE Wasserstein GAN 2018-11-30 24 / 26
  • 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results 4.1 Experimental Procedure JIN HO LEE Wasserstein GAN 2018-11-30 25 / 26
  • 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some knowledge to read Appendix • Let χ ∈ Rd be a compact set, that is closed and bounded by Heine-Borel Theorem and Prob(χ) a probability measure over χ. • We define Cb(χ) = {f : χ → R|f is continuous and bounded} • For f ∈ Cb(χ), we can define a norm ||f||∞ = max x∈χ |f(x)|, since f is bounded. • Then we have a normed vector space (Cb(χ), || · ||∞). • The dual space Cb(χ)∗ = {ϕ : Cb(χ) → R|ϕ is linear and continuous} has norm ||ϕ|| = sup f∈Cb(χ),||f||∞≤1 |ϕ(f)|. JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26
  • 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some knowledge to read Appendix • Let µ be a signed measure over χ, and let the Total Variational distance ||µ||TV = sup A⊂χ |µ(A)| where A is a Borel subset in χ. For two probability distributions Pr and Pθ, we the function δ(Pr, Pθ) = ||Pr − Pθ||TV is a distance in Prob(χ) (called the Total Variation distance) • We can consider Φ : (Prob(χ), δ) → (Cb(χ)∗ , || · ||) where Φ(P)(f) = Ex∼P[f(x)] is a linear function over Cb(χ). • By the Riesz Representation Theorem, Φ is an isometric immersion, that is δ(P, Q) = ||Φ(P) − Φ(Q)|| and ϕ is a 1-1 correspondence. JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26