SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
Uncertainty Awareness in Integrating
Machine Learning and Game Theory
不確実性を通して見る
機械学習とゲーム理論とのつながり
Rikiya Takahashi
SmartNews, Inc.
rikiya.takahashi@smartnews.com
Mar 5, 2017
Game Theory Workshop 2017
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory
About Myself
●
Rikiya TAKAHASHI, Ph.D. (高橋 力矢)
– Engineer in SmartNews, Inc., from 2015 to current
– Research Staff Member in IBM Research – Tokyo, from 2004 to 2015
● Research Interests: machine learning, reinforcement learning,
cognitive science, behavioral economics, complex systems
– Descriptive models about real human behavior
– Prescriptive decision making from descriptive models
– Robust algorithms working under high uncertainty
● Limited sample size, high dimensionality, high noise
Example of Previous Work
● Budget-Constrained Markov Decision Process for
Marketing-Mix Optimization (Takahashi+, 2013 & 2014)
2014/01/01 2014/01/08 … 2014/12/31
EM DM TM EM DM TM … EM DM TM
Segment #1 …
Segment #2 …
… …
Segment #N …
EM: e-mail DM: direct mail TM: tele-marketing
$$
E-mail
TV CM
Purchase
prediction
response
stimulus
Browsing
Revenues in past
16 weeks > $200?
#purchase in past
8 weeks > 2?
#browsing in past
4 weeks > 15?
No Yes
Strategic Segment #1
MS
#1
MS
#2
#EMs in past
2 weeks > 2?
No Yes
MS
#255
MS
#256
#EMs in past
2 weeks > 2?
No Yes
…..............................................................
...
Historical
Data
Consumer Segmentation
Time-Series Predictive Modeling
Optimal Marketing-Mix
& Targeting Rules
Example of Previous Work
● Travel-Time Distribution Prediction on a Large
Road Network (Takahashi+, 2012)
A
B
rN/L
rN/L
rN/L
rN/L
rN/L
rN/L
ψ1
(y)
ψ2
(y)
ψ3
(y)
ψ4
(y)
ψ5
(y)
ψ6
(y)
intersection
link
1
0 0
00.5 00.5
0
0.85
Road Network &
Travel Time Data by Taxi
Predictive Modeling
of Travel Time
Distribution
Route-Choice
Recommendation or
Traffic Simulation
Example of Previous Work
● Bayesian Discrete Choice Modeling for Irrational
Compromise Effect (Takahashi & Morimura, 2015)
– Explained later today
A
0
B
C
D
{A, B, C}
{B, C, D}
The option having
the highest share
inexpensiveness
product quality
Utility Calculator
(UC)
Decision Making
System (DMS)
Vector
of attributes
=
A uiA
=3.26
B uiB
=3.33
C uiC
=2.30
send
samples
utility
A
B
utility sample
utility estimate
C
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Machine Learning (ML)
● Set of inductive disciplines to design probabilistic
model and estimate its parameters that maximize
out-of-sample predictive accuracy
– Supervised learning: model and fit P(Y|X)
– Unsupervised learning: model and fit P(X)
● What machine learners care about
– Bias-variance trade-off
– Curse of dimensionality
Estimation via Bayes' theorem
● Basis behind today's most ML algorithm
posterior distribution: p(θ∣D)=
p(D∣θ ) p(θ)
∫θ
p(D∣θ ) p(θ)d θ
predictive distribution: p( y∗
∣D)=∫θ
p( y∗
∣θ) p(θ∣D)d θ
posterior mode: ̂θ =argmax
θ
[log p(D∣θ )+log p(θ )]
predictive distribution: p( y∗
∣D)≃p( y∗
∣̂θ )
Maximum A
Posteriori
estimation
Bayesian
estimation
p(θ )
approximation
● Q. Why placing a prior ?
– A1. To quantify uncertainty as posterior
– A2. To avoid overfitting
data:D model parameter:θ
E.g., Gaussian Process Regression (GPR)
● Bayesian Ridge Regression
– Unlike MAP Ridge regression (dark gray), input-
dependent uncertainty (light gray) is quantified.
prior:( f
f ∗)∼N
(0n+1 ,
(K k∗
k∗
T
K (x
∗
, x
∗
)))
where K =(Kij≡K (xi , x j )),
k∗=(K (x1, x
∗
),…, K (xn , x
∗
))
T
,
K (x , x ')=exp(−γ∥x−x'∥
2
)
data likelihood:(y
y
∗)∼N
((f
f
∗),σ
2
In+1
)
predictive distribution: y
∗
∣K , x
∗
, X , y
∼N (k∗
T
(σ
2
I n+K )
−1
y ,
K (x
∗
, x
∗
)−k∗
T
(σ
2
In+K)
−1
k∗+σ
2
)
Gap between Deduction & Induction
Today's AI is integrating both.
Do not divide the work between
inductive & deductive researchers.
Deductive Mind
● Optimize decisions for
a given environment
● Casino owner's mentality
● Game theorist, probabilist,
operations researcher
Inductive Mind
● Estimate the environment
from observations
● Gambler's mentality
● Statistician, machine learner,
econometrician
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
Estimate is different from
the true environment .
̂Θ D
Θ
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )
How the estimation-based
policy is different from
the true optimal policy ?
̂π D
π
∗
∀i∈{1,…, n} π i
∗
=arg max
πi
R(πi∣{π j
∗
}j≠i ,Θ )
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
State-of-the-art AI
Dataset
By-product
Direct Optimization
Integration of Machine
Learning and Optimization
Algorithms
Policy
Decisions
D
̌Θ D
̌π D
See the Difference
Typical Problem Solving
in the Real World:
Unnecessarily too much effort
in solving each subproblem
Vulnerable to estimation error
State-of-the-art AI
Less effort of needless
intermediate estimation
Robust to estimation error
̌Θ D
̌π D̂π D
̂Θ D
Accurately fitted on minimal
prediction error for dataset D,
while minimizing the error of
this parameter is not the goal.
Exceedingly optimized
given wrong assumption
Fitted but not minimizing the
error for dataset D. Often
less complex than .
Safely optimized with less
reliance on ̌Θ D
̂Θ D
See the Difference
Typical Problem Solving
in the Real World:
State-of-the-art AI
Solve a Hard
Inductive Problem
Solve another Hard
Deductive Problem
Solve an Easier Problem
that Involves both
Induction & Deduction
● Recommendation of simple solving
– Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8
Optimization under Uncertainty
● Interval Estimation
(e.g., Bayesian)
– Quantify uncertainty
– Optimize over all
possible environments
● Minimal Estimation
(e.g., Vapnik)
– Omit intermediate step
– Solve the minimal
optimization problem
● Two principles are effective in practice.
Vapnik's Principle (Vapnik, 1995)
When solving a problem of interest, do not solve a
more general problem as an intermediate step.
—Vladimir N. Vapnik
● E.g., classification or regression : predict Y given X
– #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem
– #2. Only fit P(Y|X)
● #2 is better than #1 because of its less estimation error.
– Better particularly when uncertainty is high: small
sample size, high dimensionality, and/or high noise
Batch Reinforcement Learning
● A good example of involving both inductive and
deductive processes.
● Also a good example of how to avoid
needlessly hard estimation.
● Basis behind the recent success of Deep Q-
Network to play games (Mnih+, 2013 & 2015),
and Alpha-Go (Silver+, 2016)
Markov Decision Process
● Framework for long-term-optimal decision making
– S: set of states, A: set of actions
P(s'|s,a): state-transition probability
r(s,a): immediate reward, : discounting factor
– Optimize policy for maximal cumulative reward
…
State #1
(e.g., Gold
Customer)
State #2
(e.g., Silver
Customer)
State #3
(e.g., Normal
Customer) t=0 t=1 t=2
$
$$
$$$
By Action #1
(e.g., ordinary discount on flight ticket)
…
t=0 t=1 t=2
$$
$
$
By Action #2
(e.g., free business-class upgrade)
γ ∈[0,1]
π (a∣s)
Markov Decision Process
● Easy to solve If the environment is known
– Via dynamic programming or linear programming
when P(s'|s,a) & r(s,a) are given with no uncertainty
– Behave myopically at
● For each state s, choose the action a that maximizes r(s,a).
– At time (t-1), choose the optimal action that maximizes
the immediate reward at time (t-1) plus the expected
reward after time t over the state transition distribution.
● What If the environment is unknown?
t →∞
Types of Reinforcement Learning
● Model-based ↔ Model-free
● On policy ↔ Off policy
● Value iteration ↔ policy search
● Model-based approach
– 1. System identification: estimate the MDP parameters
– 2. Sample multiple MDPs from the interval estimate
– 3. Solve every MDP & take the best action of best MDP
● Optimism in the face of uncertainty
Model-free approach
● Remember: our aim is to get the optimal policy.
No need of estimating environment, in principle.
– Act without fully identifying system: as long as we
choose the optimal action, it turned out right in the end.
● Even when doing estimation, utilize intermediate
statistic less complex than P(s'|s,a) & r(s,a).
Bellman Optimality Equation
● Policy is derived if we have an estimate of Q(s,a).
– Simpler than estimating P(s'|s,a) & r(s,a)
r
Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a)
[max
a'
Q(s' ,a' )
]
π (a∣s)=
{1 a=argmax
a'
Q(s ,a' )
0 otherwise
̂Q(s ,a) (si ,ai ,si ' ,ri)i=1
n● Get an estimate from episodes
Fitted Q-Iteration (Ernst+, 2005)
● For k=1,2,... iterate 1) value computation and
2) regression as
∀i∈{1,…, n} vi
(k)
:=ri+γ ̂Qk
(1)
(si ' ,argmax
a'
̂Qk
(0)
(si ' ,a')
)
∀ f ∈{0,1} ̂Qk+1
( f )
:=argmin
Q∈H
[1
2
∑i∈J f
(vi
(k )
−Q(si ,ai))
2
+R(Q)]
1)
2)
– H: hypothesis space of function, Q0
≡ 0, R: regularization term
– Indices 1...n are randomly split into sets J0
and J1
, for avoiding
over-estimation of Q values (Double Q-Learning (Hasselt, 2010)).
● Related with Experience Replay in Deep Q-
Network (Mnih+, 2013 & 2015)
– See (Lange+, 2012) for more details.
Policy Gradient
●
Accurately fit policy   while roughly fit Q(s,a)
– More directness to the final aim
– Applicable for continuous action problem
π θ (a∣s)
∇θ J (θ)⏟
gradient of performance
= Eπ θ
[∇θ logπ θ (a∣s)Q
π
(s ,a)]⏟
expected log-policy times cumulative-reward over s and a
Policy Gradient Theorem (Sutton+, 2000)
● Variations on providing the rough estimate of Q
– REINFORCE (Williams, 1992): reward samples
– Actor-Critic: regression models (e.g., Natural
Gradient (Kakade, 2002), A3C (Mnih+, 2016))
Functional Approximation in Practice
● Concrete functional form of Q(s,a) and/or
– Q should be a universal functional approximator:
class of functions that can approximate any function
if sufficiently many parameters are introduced.
● Examples of universal approximator
Tree Ensembles
Random Forest, Gradient
Boosted Decision Trees
(Deep) Neural
Networks
Mixture of Radial
Basis Functions
(RBFs)
+
π (a∣s)
Functional Approximation in Practice
● Is any univ. approximator OK? – No, unfortunately.
– Universal approximator is merely asymptotically unbiased.
– Better to have
● Low variance in terms of bias-variance trade-off
● Resistance to curse of dimensionality
● One reason of deep learning's success
– Flexibility to represent multi-modal function with less
parameters than nonparametric (RBF or tree) models
– Techniques to stabilize numerical optimization
● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.
Message
● Uncertainty awareness is essential on data-
oriented decision making.
– No division between induction and deduction
– Removing needless intermediate estimation
– Fitted Q-Iteration as an illustrative example
● Less parameters, less uncertainty
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Shrinkage Matters in the Real World.
● Q. Why prior helps avoid over-fitting?
– A. shrinkage towards prior mean (e.g., 0 in Ridge reg.)
● Over-optimization ↔ Over-rationalization?
– (e.g., (Takahashi and Morimura, 2015))
0 Coefficient #1
Coefficient #2
Solution of
2-dimensional
OLS &
Ridge regression
Ordinary Least Squares (OLS)
Ridge : closer to prior mean 0 than OLS
Prior mean 0 is independent from training data
Discrete Choice Modelling
Goal: predict prob. of choosing an option from a choice set.
Why solving this problem?
Brand positioning among competitors
Sales promotion (yet involving some abuse)
Game Theory Workshop 2017 Uncertainty Awareness
Random Utility Theory as a Rational Model
Each human is a rational maximizer of random utility.
Theoretical basis behind many statistical marketing models.
Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train,
2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint
analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and
Urtasun, 2009)), ...
Game Theory Workshop 2017 Uncertainty Awareness
Complexity of Real Human’s Choice
An example of choosing PC (Kivetz et al., 2004)
Each subject chooses 1 option from a choice set
A B C D E
CPU [MHz] 250 300 350 400 450
Mem. [MB] 192 160 128 96 64
Choice Set #subjects
{A, B, C} 36:176:144
{B, C, D} 56:177:115
{C, D, E} 94:181:109
Can random utility theory still explain the preference reversals?
B C or C B?
Game Theory Workshop 2017 Uncertainty Awareness
Similarity E↵ect (Tversky, 1972)
Top-share choice can change due to correlated utilities.
E.g., one color from {Blue, Red} or {Violet, Blue, Red}?
Game Theory Workshop 2017 Uncertainty Awareness
Attraction E↵ect (Huber et al., 1982)
Introduction of an absolutely-inferior option A (=decoy)
causes irregular increase of option A’s attractiveness.
Despite the natural guess that decoy never a↵ects the choice.
If D A, then D A A .
If A D, then A is superior to both A and D.
Game Theory Workshop 2017 Uncertainty Awareness
Compromise E↵ect (Simonson, 1989)
Moderate options within each chosen set are preferred.
Di↵erent from non-linear utility function involving
diminishing returns (e.g.,
p
inexpensiveness+
p
quality).
Game Theory Workshop 2017 Uncertainty Awareness
Positioning of the Proposed Work
Sim.: similarity, Attr.: attraction, Com.: compromise
Sim. Attr. Com. Mechanism Predict. for Likelihood
Test Set Maximization
SPM OK NG NG correlation OK MCMC
MDFT OK OK OK dominance & indi↵erence OK MCMC
PD OK OK OK nonlinear pairwise comparison OK MCMC
MMLM OK NG OK none OK Non-convex
NLM OK NG NG hierarchy NG Non-convex
BSY OK OK OK Bayesian OK MCMC
LCA OK OK OK loss aversion OK MCMC
MLBA OK OK OK nonlinear accumulation OK Non-convex
Proposed OK NG OK Bayesian OK Convex
MDFT: Multialternative Decision Field Theory (Roe et al., 2001)
PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002)
MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000)
SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009)
NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001)
BSY: Bayesian Model of (Shenoy and Yu, 2013)
LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004)
MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014)
Game Theory Workshop 2017 Uncertainty Awareness
Key Idea #1: a Dual Personality Model
Regard human as an estimator of her/his own utility function.
Assumption 1: DMS does not know the original utility func.
1 UC computes the sample value of every option’s utility,
and sends only these samples to DMS.
2 DMS statistically estimates the utility function.
Game Theory Workshop 2017 Uncertainty Awareness
Utility Calculator as Rational Personality
For every context i and option j, UC computes noiseless
sample of utility vij by applying utility function fUC : RdX !R.
vij = fUC (xij ), fUC (x),b + w>
(x)
b: bias term
: RdX
!Rd
: mapping function
w !Rd
: vector of coe cients
Game Theory Workshop 2017 Uncertainty Awareness
Key Idea #2: DMS is a Bayesian estimator
DMS does not know fUC but has utility samples {vij }
m[i]
j=1 .
Assumption 2: DMS places a choice-set-dependent Gaussian
Process (GP) prior on regressing the utility function.
µi ⇠ N 0m[i], 2
K(Xi )
K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i]
vi , (vi1, . . ., vim[i])>
⇠N µi , 2
Im[i]
µi 2Rm[i]
: vector of utility
2
: noise level
K(·, ·): similarity function
Xi , (xi1 2RdX
, . . . , xim[i])>
The posterior mean is given as
u⇤
i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi )
1
b1m[i]+ i w .
Game Theory Workshop 2017 Uncertainty Awareness
Convex Optimization for Model Parameters
Likelihood of the entire model is tractable, assuming the choice
is given by a logit whose mean utility is the posterior mean u⇤
i .
Thus we can fit the function fUC from the choice data.
Conveniently, MAP estimation of fUC is convex for fixed K.
bb, cw = max
b,w
nX
i=1
`(bHi 1m[i]+Hi i w , yi )
c
2
kw k2
where `(u⇤
i , yi ),log
exp(u⇤
iyi
)
Pm[i]
j0=1exp(u⇤
ij0 )
and Hi ,K(Xi )(Im[i]+K(Xi )) 1
Game Theory Workshop 2017 Uncertainty Awareness
Irrationality as Bayesian Shrinkage
Implication from the posterior-mean utility in (1)
Each option’s utility is shrunk into prior mean 0.
Strong shrinkage for an option dissimilar to the others,
due to its high posterior variance (=uncertainty).
u⇤
i = K(Xi ) Im[i]+K(Xi )
1
| {z }
shrinkage factor
b1m[i]+ i w
| {z }
vec. of utility samples
. (1)
Context e↵ects as Bayesian uncertainty aversion
E.g., RBF kernel
K(x, x0
)=exp( kx x0
k2
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4
FinalEvaluation
X1=(5-X2)
DCBA
{A,B,C}
{B,C,D}
Game Theory Workshop 2017 Uncertainty Awareness
Recovered Context-Dependent Choice Criteria
For a speaker dataset: successfully captured mixture of
objective preference and subjective context e↵ects.
A B C D E
Power [Watt] 50 75 100 125 150
Price [USD] 100 130 160 190 220
Choice Set #subjects
{A, B, C} 45:135:145
{B, C, D} 58:137:111
{C, D, E} 95:155: 91
2
3
4
100 150 200
Evaluation
Price [USD]
EDCBA
Obj. Eval.
{A,B,C}
{B,C,D}
{C,D,E}
-1.1
-1
-0.9
-0.8
AverageLog-Likelihood
Dataset
PC SP SM
LinLogit
NpLogit
LinMix
NpMix
GPUA
Game Theory Workshop 2017 Uncertainty Awareness
A Result of p-beauty Contest by Real Humans
Guess 2/3 of all votes (0-100). Mean is apart from the Nash
equilibrium 0 (Camerer et al., 2004; Ho et al., 2006).
Table: Average Choice in (2/3)-beauty Contests
Subject Pool Group Size Sample Size Mean[Yi ]
Caltech Board 73 73 49.4
80 year olds 33 33 37.0
High School Students 20-32 52 32.5
Economics PhDs 16 16 27.4
Portfolio Managers 26 26 24.3
Caltech Students 3 24 21.5
Game Theorists 27-54 136 19.1
Game Theory Workshop 2017 Uncertainty Awareness
Modeling Bounded Rationality
Early stopping at step k: Level-k thinking or Cognitive
Hierarchy Theory (Camerer et al., 2004)
Humans cannot predict the infinite future.
Using non-stationary transitional state
Randomization of utility via noise "it: Quantal Response
Equilibrium (McKelvey and Palfrey, 1995)
8i 2{1, . . . , n} Y
(t)
i |Y
(t 1)
i = arg max
Y
h
fi (Y , Y
(t 1)
i ) + "it
i
Both methods essentially work as regularization of rationality.
Shrinkage into initial values or uniform choice probabilities
Game Theory Workshop 2017 Uncertainty Awareness
Linking ML with Game Theory (GT)
via Shrinkage Principle
Optimization
without shrinkage
Optimization
with shrinkage
ML GT
Maximum-Likelihood estimation
Bayesian estimation Transitional State
or Quantal Response Equilibrium
Nash Equilibrium
Optimal for training data,
but less generalization
capability to test data
Optimal for given game
but less predictable to real-
world decisions
Shrinkage towards uniform
probabilities causes suboptimality
for the given game, but more
predictable to real-world decisions
Shrinkage towards prior causes
suboptimality for training data,
but more generalization capability
to test data
Early Stopping and Regularization
ML as a Dynamical System
to find the optimal parameters
GT as a Dynamical System
to find the equilibrium
Parameter #1
Parameter #2
Exact Maximum-likelihood
estimate (e.g., OLS)
Exact Bayesian estimate
shrunk towards zero
(e.g., Ridge regression)
0
t=10
t=20
t=30
t=50
An early-stopping
estimate (e.g., Partial
Least Squares)
t=0
t=1
t →∞
t=2
...
mean = 50
mean = 34
mean = 15
mean = 0
Nash
Equilibrium
Level-2
Transitional State
Message
● Bayesian shrinkage ↔ Bounded rationality
– Dual-personality model for contextual effects
– Towards data-oriented & more realistic games:
export ML regularization techniques to GT
● Analyze dynamics or uncertainty-aware equilibria
– Early-stopped transitional state, or
– QRE with uncertainty on each player's utility function
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Additional Implications from ML
● Multiple equilibria or saddle points?
● Equilibria or “typical” transitional states?
– Slow convergence
– Plateau of objective function
Recent history in ML
● Waste of ~20 years for local optimality issue
– Neural Networks (NNs) have been criticized for their local
optimality in fitting the parameters.
– ML community has been sticked with convex optimization
approaches (e.g., Support Vector Machines (Vapnik, 1995)).
– Most solutions in fitting high-dimensional NNs, however, are
found to be not local optima but saddle points (Bray & Dean,
2007; Dauphin+, 2014)!
– After skipping saddle points by perturbation, most of the local
optima empirically provide similar prediction capabilities.
● Please do not make the same mistake in multi-
agent optimization problems (=games)!
Why most are saddle points?
● See spectrum of Hessian matrices of a random-
drawn non-linear function from a Gaussian process.
Local minima: every
eigenvalue is positive.
Local maxima: every
eigenvalue is negative.
Univariate Function
Saddle point: both
positive & negative
eigenvalues exist.
● In high-dimensional function, Hessian contains both
positive & negative eigenvalues with high probability.
Bivariate Function
https://en.wikipedia.org/wiki/Saddle_point
Open Questions for Multiple Equilibria
● If a game is very complex involving lots of
parameters in pay-off or utility functions, then
– Are most of its critical points unstable saddle points?
– Is number of equilibria much smaller than our guess?
● If we obtain a few equilibria of such complex game,
– Do most of such equilibria have similar properties?
– Don't we have to obtain other equilibria?
See Dynamics:
“Typical” Transitional State?
● MLers are sensitive to convergence rate in fitting.
– We are in the finite-sample & high-dimensional world:
only asymptotics is powerless, and computational
estimate is not equilibrium but transitional state.
http://sebastianruder.com/optimizing-gradient-descent/
(Kingma & Ba, 2015)
See Dynamics:
“Typical” Transitional State?
● Mixing time of Markov processes of some games
is exponential to the number of players.
– E.g., (Axtell+, 2000) equilibrium: equality of wealth
transitional states: severe inequality
Nash demand game
Equilibrium Transitional State
● What If #players is over thousands or millions?
– Severe inequality in most of the time
See Dynamics: Trapped in Plateau?
● Fitting of a Deep NN is often trapped in plateaus.
– Natural gradient descent (Amari, 1997) is often used
for quickly escaping from plateau.
– In real-world games, are people trapped in plateaus
rather than equilibria?
https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
Conclusion
● Discussed how uncertainty should be incorporated
in inductive & deductive decision making.
– Quantifying uncertainty or simpler minimal estimation
● Linked Bayesian shrinkage with bounded rationality
– Towards data-oriented regularized equilibrium
● Implications from high-dimensional ML
– Saddle points, transitional state, and/or plateau
THANK YOU FOR ATTENDING!
Download this material from
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory
References
References I
Amari, S. (1997). Neural learning in structured parameter spaces -
natural Riemannian gradient. In Advances in Neural Information
Processing Systems 9, pages 127–133. MIT Press.
Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes
in a multi-agent bargaining model. Working papers, Brookings
Institution - Working Papers.
Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of
gaussian fields on large-dimensional spaces. Physics Review Letters,
98:150201.
Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there
something quantum-like about the human mental lexicon? Journal of
Mathematical Psychology, 53(5):362–377.
Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy
model of games. Quarterly Journal of Economics, 119:861–898.
Game Theory Workshop 2017 Uncertainty Awareness
References
References II
Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to
conjoint analysis. In Advances in Neural Information Processing
Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA.
Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice,
2:19–33.
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and
Bengio, Y. (2014). Identifying and attacking the saddle point problem
in high-dimensional non-convex optimization. In Advances in Neural
Information Processing Systems 27, pages 2933–2941. Curran
Associates, Inc.
de Barros, J. A. and Suppes, P. (2009). Quantum mechanics,
interference, and the brain. Journal of Mathematical Psychology,
53(5):306–313.
Game Theory Workshop 2017 Uncertainty Awareness
References
References III
Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and
Allenby, G. M. (2009). A probit model with structured covariance for
similarity e↵ects and source of volume calculations.
http://ssrn.com/abstract=1396232.
Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and
context-sensitive model of choice behavior. Psychological Review,
109:137–154.
Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer
research: Issues and outlook. Journal of Consumer Research,
5:103–123.
Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology
of consumer and firm behavior with behavioral economics. Journal of
Marketing Research, 43(3):307–331.
Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically
dominated alternatives: Violations of regularity and the similarity
hypothesis. Journal of Consumer Research, 9:90–98.
Game Theory Workshop 2017 Uncertainty Awareness
References
References IV
Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G.,
Becker, S., and Ghahramani, Z., editors, Advances in Neural
Information Processing Systems 14, pages 1531–1538. MIT Press.
Kingma, D. and Ba, J. (2015). Adam: A method for stochastic
optimization. In The International Conference on Learning
Representations (ICLR), San Diego.
Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models
for capturing the compromise e↵ect. Journal of Marketing Research,
41(3):237–257.
Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization
with gaussian processes. In Proceedings of the 26th Annual
International Conference on Machine Learning (ICML 2009), pages
601–608, New York, NY, USA. ACM.
McFadden, D. and Train, K. (2000). Mixed MNL models for discrete
response. Journal of Applied Econometrics, 15:447–470.
Game Theory Workshop 2017 Uncertainty Awareness
References
References V
McFadden, D. L. (1980). Econometric models of probabilistic choice
among products. Journal of Business, 53(3):13–29.
McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for
normal form games. Games and Economic Behavior, 10:6–38.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.,
Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for
deep reinforcement learning. In Proceedings of The 33rd International
Conference on Machine Learning (ICML 2016), pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare,
M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen,
S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D.,
Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control
through deep reinforcement learning. Nature, 518:529–533.
Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy:
A model of the KT (kahnemantversky)-man. Journal of Mathematical
Psychology, 53(5):349–361.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VI
Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).
Multialternative decision field theory: A dynamic connectionist model
of decision making. Psychological Review, 108:370–392.
Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects
in preference choice: What makes for a bargain? In Proceedings of the
Cognitive Science Society Conference.
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den
Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V.,
Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N.,
Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T.,
and Hassabis, D. (2016). Mastering the game of Go with deep neural
networks and tree search. Nature, 529:484–489.
Simonson, I. (1989). Choice based on reasons: The case of attraction
and compromise e↵ects. Journal of Consumer Research, 16:158–174.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VII
Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000).
Policy gradient methods for reinforcement learning with function
approximation. In Advances in Neural Information Processing Systems
12, pages 1057–1063. MIT Press.
Takahashi, R. and Morimura, T. (2015). Predicting preference reversals
via gaussian process uncertainty aversion. In Proceedings of the 18th
International Conference on Artificial Intelligence and Statistics
(AISTATS 2015), pages 958–967.
Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator
model of context e↵ects in multialternative choice. Psychological
Review, 121(2):179–205.
Tversky, A. (1972). Elimination by aspects: A theory of choice.
Psychological Review, 79:281–299.
Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in
dynamical models of multialternative choice. Psychological Review,
111:757–769.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VIII
Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit
model. Transportation Research Part B, 35:627–641.
Williams, H. (1977). On the formulation of travel demand models and
economic evaluation measures of user benefit. Environment and
Planning A, 9(3):285–344.
Williams, R. J. (1992). Simple statistical gradient-following algorithms
for connectionist reinforcement learning. 8(3):229–256.
Yai, T. (1997). Multinomial probit with structured covariance for route
choice behavior. Transportation Research Part B: Methodological,
31(3):195–207.
Game Theory Workshop 2017 Uncertainty Awareness

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
 
Irs gan doc
Irs gan docIrs gan doc
Irs gan doc
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
データ解析10 因子分析の基礎
データ解析10 因子分析の基礎データ解析10 因子分析の基礎
データ解析10 因子分析の基礎
 
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + αNIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
 
Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)Graph convolution (スペクトルアプローチ)
Graph convolution (スペクトルアプローチ)
 
Large scale gan training for high fidelity natural
Large scale gan training for high fidelity naturalLarge scale gan training for high fidelity natural
Large scale gan training for high fidelity natural
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
TensorFlowで逆強化学習
TensorFlowで逆強化学習TensorFlowで逆強化学習
TensorFlowで逆強化学習
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
 
Neural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFNNeural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFN
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
 
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...[DL輪読会]Revisiting Deep Learning Models for Tabular Data  (NeurIPS 2021) 表形式デー...
[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...
 
Devsumi 2018summer
Devsumi 2018summerDevsumi 2018summer
Devsumi 2018summer
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則
 
データ解析1 ベクトルの復習
データ解析1 ベクトルの復習データ解析1 ベクトルの復習
データ解析1 ベクトルの復習
 

Destacado

Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
Koji Matsuda
 
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
Takayuki Sekine
 
オンコロジストなるためのスキル
オンコロジストなるためのスキルオンコロジストなるためのスキル
オンコロジストなるためのスキル
musako-oncology
 

Destacado (20)

15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学
 
Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年
 
Argmax Operations in NLP
Argmax Operations in NLPArgmax Operations in NLP
Argmax Operations in NLP
 
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation
 
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...
最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...
 
オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析
 
「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報
 
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイント
 
オンコロジストなるためのスキル
オンコロジストなるためのスキルオンコロジストなるためのスキル
オンコロジストなるためのスキル
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理
 
ディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみたディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみた
 
学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題
 

Similar a Uncertainty Awareness in Integrating Machine Learning and Game Theory

Similar a Uncertainty Awareness in Integrating Machine Learning and Game Theory (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
Direct policy search
Direct policy searchDirect policy search
Direct policy search
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 

Último

➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men 🔝Malda🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men  🔝Malda🔝   Escorts Ser...➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men  🔝Malda🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men 🔝Malda🔝 Escorts Ser...
amitlee9823
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort : 9352852248 Make on-demand Arrangements Near yOU
 
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
priyasharma62062
 
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 

Último (20)

➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men 🔝Malda🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men  🔝Malda🔝   Escorts Ser...➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men  🔝Malda🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ Malda Call-girls in Women Seeking Men 🔝Malda🔝 Escorts Ser...
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
 
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
Mira Road Awesome 100% Independent Call Girls NUmber-9833754194-Dahisar Inter...
 
Airport Road Best Experience Call Girls Number-📞📞9833754194 Santacruz MOst Es...
Airport Road Best Experience Call Girls Number-📞📞9833754194 Santacruz MOst Es...Airport Road Best Experience Call Girls Number-📞📞9833754194 Santacruz MOst Es...
Airport Road Best Experience Call Girls Number-📞📞9833754194 Santacruz MOst Es...
 
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
 
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Banaswadi Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )
 
(INDIRA) Call Girl Srinagar Call Now 8617697112 Srinagar Escorts 24x7
(INDIRA) Call Girl Srinagar Call Now 8617697112 Srinagar Escorts 24x7(INDIRA) Call Girl Srinagar Call Now 8617697112 Srinagar Escorts 24x7
(INDIRA) Call Girl Srinagar Call Now 8617697112 Srinagar Escorts 24x7
 
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
 
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbaiVasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
 
7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options
 
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Sant Nagar (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
 
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
CBD Belapur Expensive Housewife Call Girls Number-📞📞9833754194 No 1 Vipp HIgh...
 
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
 
(Sexy Sheela) Call Girl Mumbai Call Now 👉9920725232👈 Mumbai Escorts 24x7
(Sexy Sheela) Call Girl Mumbai Call Now 👉9920725232👈 Mumbai Escorts 24x7(Sexy Sheela) Call Girl Mumbai Call Now 👉9920725232👈 Mumbai Escorts 24x7
(Sexy Sheela) Call Girl Mumbai Call Now 👉9920725232👈 Mumbai Escorts 24x7
 
Vasai-Virar High Profile Model Call Girls📞9833754194-Nalasopara Satisfy Call ...
Vasai-Virar High Profile Model Call Girls📞9833754194-Nalasopara Satisfy Call ...Vasai-Virar High Profile Model Call Girls📞9833754194-Nalasopara Satisfy Call ...
Vasai-Virar High Profile Model Call Girls📞9833754194-Nalasopara Satisfy Call ...
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
 
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
Diva-Thane European Call Girls Number-9833754194-Diva Busty Professional Call...
 

Uncertainty Awareness in Integrating Machine Learning and Game Theory

  • 1. Uncertainty Awareness in Integrating Machine Learning and Game Theory 不確実性を通して見る 機械学習とゲーム理論とのつながり Rikiya Takahashi SmartNews, Inc. rikiya.takahashi@smartnews.com Mar 5, 2017 Game Theory Workshop 2017 https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
  • 2. About Myself ● Rikiya TAKAHASHI, Ph.D. (高橋 力矢) – Engineer in SmartNews, Inc., from 2015 to current – Research Staff Member in IBM Research – Tokyo, from 2004 to 2015 ● Research Interests: machine learning, reinforcement learning, cognitive science, behavioral economics, complex systems – Descriptive models about real human behavior – Prescriptive decision making from descriptive models – Robust algorithms working under high uncertainty ● Limited sample size, high dimensionality, high noise
  • 3. Example of Previous Work ● Budget-Constrained Markov Decision Process for Marketing-Mix Optimization (Takahashi+, 2013 & 2014) 2014/01/01 2014/01/08 … 2014/12/31 EM DM TM EM DM TM … EM DM TM Segment #1 … Segment #2 … … … Segment #N … EM: e-mail DM: direct mail TM: tele-marketing $$ E-mail TV CM Purchase prediction response stimulus Browsing Revenues in past 16 weeks > $200? #purchase in past 8 weeks > 2? #browsing in past 4 weeks > 15? No Yes Strategic Segment #1 MS #1 MS #2 #EMs in past 2 weeks > 2? No Yes MS #255 MS #256 #EMs in past 2 weeks > 2? No Yes ….............................................................. ... Historical Data Consumer Segmentation Time-Series Predictive Modeling Optimal Marketing-Mix & Targeting Rules
  • 4. Example of Previous Work ● Travel-Time Distribution Prediction on a Large Road Network (Takahashi+, 2012) A B rN/L rN/L rN/L rN/L rN/L rN/L ψ1 (y) ψ2 (y) ψ3 (y) ψ4 (y) ψ5 (y) ψ6 (y) intersection link 1 0 0 00.5 00.5 0 0.85 Road Network & Travel Time Data by Taxi Predictive Modeling of Travel Time Distribution Route-Choice Recommendation or Traffic Simulation
  • 5. Example of Previous Work ● Bayesian Discrete Choice Modeling for Irrational Compromise Effect (Takahashi & Morimura, 2015) – Explained later today A 0 B C D {A, B, C} {B, C, D} The option having the highest share inexpensiveness product quality Utility Calculator (UC) Decision Making System (DMS) Vector of attributes = A uiA =3.26 B uiB =3.33 C uiC =2.30 send samples utility A B utility sample utility estimate C
  • 6. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 7. Machine Learning (ML) ● Set of inductive disciplines to design probabilistic model and estimate its parameters that maximize out-of-sample predictive accuracy – Supervised learning: model and fit P(Y|X) – Unsupervised learning: model and fit P(X) ● What machine learners care about – Bias-variance trade-off – Curse of dimensionality
  • 8. Estimation via Bayes' theorem ● Basis behind today's most ML algorithm posterior distribution: p(θ∣D)= p(D∣θ ) p(θ) ∫θ p(D∣θ ) p(θ)d θ predictive distribution: p( y∗ ∣D)=∫θ p( y∗ ∣θ) p(θ∣D)d θ posterior mode: ̂θ =argmax θ [log p(D∣θ )+log p(θ )] predictive distribution: p( y∗ ∣D)≃p( y∗ ∣̂θ ) Maximum A Posteriori estimation Bayesian estimation p(θ ) approximation ● Q. Why placing a prior ? – A1. To quantify uncertainty as posterior – A2. To avoid overfitting data:D model parameter:θ
  • 9. E.g., Gaussian Process Regression (GPR) ● Bayesian Ridge Regression – Unlike MAP Ridge regression (dark gray), input- dependent uncertainty (light gray) is quantified. prior:( f f ∗)∼N (0n+1 , (K k∗ k∗ T K (x ∗ , x ∗ ))) where K =(Kij≡K (xi , x j )), k∗=(K (x1, x ∗ ),…, K (xn , x ∗ )) T , K (x , x ')=exp(−γ∥x−x'∥ 2 ) data likelihood:(y y ∗)∼N ((f f ∗),σ 2 In+1 ) predictive distribution: y ∗ ∣K , x ∗ , X , y ∼N (k∗ T (σ 2 I n+K ) −1 y , K (x ∗ , x ∗ )−k∗ T (σ 2 In+K) −1 k∗+σ 2 )
  • 10. Gap between Deduction & Induction Today's AI is integrating both. Do not divide the work between inductive & deductive researchers. Deductive Mind ● Optimize decisions for a given environment ● Casino owner's mentality ● Game theorist, probabilist, operations researcher Inductive Mind ● Estimate the environment from observations ● Gambler's mentality ● Statistician, machine learner, econometrician
  • 11. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D Estimate is different from the true environment . ̂Θ D Θ ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D )
  • 12. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D ) How the estimation-based policy is different from the true optimal policy ? ̂π D π ∗ ∀i∈{1,…, n} π i ∗ =arg max πi R(πi∣{π j ∗ }j≠i ,Θ )
  • 13. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D State-of-the-art AI Dataset By-product Direct Optimization Integration of Machine Learning and Optimization Algorithms Policy Decisions D ̌Θ D ̌π D
  • 14. See the Difference Typical Problem Solving in the Real World: Unnecessarily too much effort in solving each subproblem Vulnerable to estimation error State-of-the-art AI Less effort of needless intermediate estimation Robust to estimation error ̌Θ D ̌π D̂π D ̂Θ D Accurately fitted on minimal prediction error for dataset D, while minimizing the error of this parameter is not the goal. Exceedingly optimized given wrong assumption Fitted but not minimizing the error for dataset D. Often less complex than . Safely optimized with less reliance on ̌Θ D ̂Θ D
  • 15. See the Difference Typical Problem Solving in the Real World: State-of-the-art AI Solve a Hard Inductive Problem Solve another Hard Deductive Problem Solve an Easier Problem that Involves both Induction & Deduction ● Recommendation of simple solving – Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8
  • 16. Optimization under Uncertainty ● Interval Estimation (e.g., Bayesian) – Quantify uncertainty – Optimize over all possible environments ● Minimal Estimation (e.g., Vapnik) – Omit intermediate step – Solve the minimal optimization problem ● Two principles are effective in practice.
  • 17. Vapnik's Principle (Vapnik, 1995) When solving a problem of interest, do not solve a more general problem as an intermediate step. —Vladimir N. Vapnik ● E.g., classification or regression : predict Y given X – #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem – #2. Only fit P(Y|X) ● #2 is better than #1 because of its less estimation error. – Better particularly when uncertainty is high: small sample size, high dimensionality, and/or high noise
  • 18. Batch Reinforcement Learning ● A good example of involving both inductive and deductive processes. ● Also a good example of how to avoid needlessly hard estimation. ● Basis behind the recent success of Deep Q- Network to play games (Mnih+, 2013 & 2015), and Alpha-Go (Silver+, 2016)
  • 19. Markov Decision Process ● Framework for long-term-optimal decision making – S: set of states, A: set of actions P(s'|s,a): state-transition probability r(s,a): immediate reward, : discounting factor – Optimize policy for maximal cumulative reward … State #1 (e.g., Gold Customer) State #2 (e.g., Silver Customer) State #3 (e.g., Normal Customer) t=0 t=1 t=2 $ $$ $$$ By Action #1 (e.g., ordinary discount on flight ticket) … t=0 t=1 t=2 $$ $ $ By Action #2 (e.g., free business-class upgrade) γ ∈[0,1] π (a∣s)
  • 20. Markov Decision Process ● Easy to solve If the environment is known – Via dynamic programming or linear programming when P(s'|s,a) & r(s,a) are given with no uncertainty – Behave myopically at ● For each state s, choose the action a that maximizes r(s,a). – At time (t-1), choose the optimal action that maximizes the immediate reward at time (t-1) plus the expected reward after time t over the state transition distribution. ● What If the environment is unknown? t →∞
  • 21. Types of Reinforcement Learning ● Model-based ↔ Model-free ● On policy ↔ Off policy ● Value iteration ↔ policy search ● Model-based approach – 1. System identification: estimate the MDP parameters – 2. Sample multiple MDPs from the interval estimate – 3. Solve every MDP & take the best action of best MDP ● Optimism in the face of uncertainty
  • 22. Model-free approach ● Remember: our aim is to get the optimal policy. No need of estimating environment, in principle. – Act without fully identifying system: as long as we choose the optimal action, it turned out right in the end. ● Even when doing estimation, utilize intermediate statistic less complex than P(s'|s,a) & r(s,a).
  • 23. Bellman Optimality Equation ● Policy is derived if we have an estimate of Q(s,a). – Simpler than estimating P(s'|s,a) & r(s,a) r Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a) [max a' Q(s' ,a' ) ] π (a∣s)= {1 a=argmax a' Q(s ,a' ) 0 otherwise ̂Q(s ,a) (si ,ai ,si ' ,ri)i=1 n● Get an estimate from episodes
  • 24. Fitted Q-Iteration (Ernst+, 2005) ● For k=1,2,... iterate 1) value computation and 2) regression as ∀i∈{1,…, n} vi (k) :=ri+γ ̂Qk (1) (si ' ,argmax a' ̂Qk (0) (si ' ,a') ) ∀ f ∈{0,1} ̂Qk+1 ( f ) :=argmin Q∈H [1 2 ∑i∈J f (vi (k ) −Q(si ,ai)) 2 +R(Q)] 1) 2) – H: hypothesis space of function, Q0 ≡ 0, R: regularization term – Indices 1...n are randomly split into sets J0 and J1 , for avoiding over-estimation of Q values (Double Q-Learning (Hasselt, 2010)). ● Related with Experience Replay in Deep Q- Network (Mnih+, 2013 & 2015) – See (Lange+, 2012) for more details.
  • 25. Policy Gradient ● Accurately fit policy   while roughly fit Q(s,a) – More directness to the final aim – Applicable for continuous action problem π θ (a∣s) ∇θ J (θ)⏟ gradient of performance = Eπ θ [∇θ logπ θ (a∣s)Q π (s ,a)]⏟ expected log-policy times cumulative-reward over s and a Policy Gradient Theorem (Sutton+, 2000) ● Variations on providing the rough estimate of Q – REINFORCE (Williams, 1992): reward samples – Actor-Critic: regression models (e.g., Natural Gradient (Kakade, 2002), A3C (Mnih+, 2016))
  • 26. Functional Approximation in Practice ● Concrete functional form of Q(s,a) and/or – Q should be a universal functional approximator: class of functions that can approximate any function if sufficiently many parameters are introduced. ● Examples of universal approximator Tree Ensembles Random Forest, Gradient Boosted Decision Trees (Deep) Neural Networks Mixture of Radial Basis Functions (RBFs) + π (a∣s)
  • 27. Functional Approximation in Practice ● Is any univ. approximator OK? – No, unfortunately. – Universal approximator is merely asymptotically unbiased. – Better to have ● Low variance in terms of bias-variance trade-off ● Resistance to curse of dimensionality ● One reason of deep learning's success – Flexibility to represent multi-modal function with less parameters than nonparametric (RBF or tree) models – Techniques to stabilize numerical optimization ● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.
  • 28. Message ● Uncertainty awareness is essential on data- oriented decision making. – No division between induction and deduction – Removing needless intermediate estimation – Fitted Q-Iteration as an illustrative example ● Less parameters, less uncertainty
  • 29. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 30. Shrinkage Matters in the Real World. ● Q. Why prior helps avoid over-fitting? – A. shrinkage towards prior mean (e.g., 0 in Ridge reg.) ● Over-optimization ↔ Over-rationalization? – (e.g., (Takahashi and Morimura, 2015)) 0 Coefficient #1 Coefficient #2 Solution of 2-dimensional OLS & Ridge regression Ordinary Least Squares (OLS) Ridge : closer to prior mean 0 than OLS Prior mean 0 is independent from training data
  • 31. Discrete Choice Modelling Goal: predict prob. of choosing an option from a choice set. Why solving this problem? Brand positioning among competitors Sales promotion (yet involving some abuse) Game Theory Workshop 2017 Uncertainty Awareness
  • 32. Random Utility Theory as a Rational Model Each human is a rational maximizer of random utility. Theoretical basis behind many statistical marketing models. Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train, 2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and Urtasun, 2009)), ... Game Theory Workshop 2017 Uncertainty Awareness
  • 33. Complexity of Real Human’s Choice An example of choosing PC (Kivetz et al., 2004) Each subject chooses 1 option from a choice set A B C D E CPU [MHz] 250 300 350 400 450 Mem. [MB] 192 160 128 96 64 Choice Set #subjects {A, B, C} 36:176:144 {B, C, D} 56:177:115 {C, D, E} 94:181:109 Can random utility theory still explain the preference reversals? B C or C B? Game Theory Workshop 2017 Uncertainty Awareness
  • 34. Similarity E↵ect (Tversky, 1972) Top-share choice can change due to correlated utilities. E.g., one color from {Blue, Red} or {Violet, Blue, Red}? Game Theory Workshop 2017 Uncertainty Awareness
  • 35. Attraction E↵ect (Huber et al., 1982) Introduction of an absolutely-inferior option A (=decoy) causes irregular increase of option A’s attractiveness. Despite the natural guess that decoy never a↵ects the choice. If D A, then D A A . If A D, then A is superior to both A and D. Game Theory Workshop 2017 Uncertainty Awareness
  • 36. Compromise E↵ect (Simonson, 1989) Moderate options within each chosen set are preferred. Di↵erent from non-linear utility function involving diminishing returns (e.g., p inexpensiveness+ p quality). Game Theory Workshop 2017 Uncertainty Awareness
  • 37. Positioning of the Proposed Work Sim.: similarity, Attr.: attraction, Com.: compromise Sim. Attr. Com. Mechanism Predict. for Likelihood Test Set Maximization SPM OK NG NG correlation OK MCMC MDFT OK OK OK dominance & indi↵erence OK MCMC PD OK OK OK nonlinear pairwise comparison OK MCMC MMLM OK NG OK none OK Non-convex NLM OK NG NG hierarchy NG Non-convex BSY OK OK OK Bayesian OK MCMC LCA OK OK OK loss aversion OK MCMC MLBA OK OK OK nonlinear accumulation OK Non-convex Proposed OK NG OK Bayesian OK Convex MDFT: Multialternative Decision Field Theory (Roe et al., 2001) PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002) MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000) SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009) NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001) BSY: Bayesian Model of (Shenoy and Yu, 2013) LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004) MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014) Game Theory Workshop 2017 Uncertainty Awareness
  • 38. Key Idea #1: a Dual Personality Model Regard human as an estimator of her/his own utility function. Assumption 1: DMS does not know the original utility func. 1 UC computes the sample value of every option’s utility, and sends only these samples to DMS. 2 DMS statistically estimates the utility function. Game Theory Workshop 2017 Uncertainty Awareness
  • 39. Utility Calculator as Rational Personality For every context i and option j, UC computes noiseless sample of utility vij by applying utility function fUC : RdX !R. vij = fUC (xij ), fUC (x),b + w> (x) b: bias term : RdX !Rd : mapping function w !Rd : vector of coe cients Game Theory Workshop 2017 Uncertainty Awareness
  • 40. Key Idea #2: DMS is a Bayesian estimator DMS does not know fUC but has utility samples {vij } m[i] j=1 . Assumption 2: DMS places a choice-set-dependent Gaussian Process (GP) prior on regressing the utility function. µi ⇠ N 0m[i], 2 K(Xi ) K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i] vi , (vi1, . . ., vim[i])> ⇠N µi , 2 Im[i] µi 2Rm[i] : vector of utility 2 : noise level K(·, ·): similarity function Xi , (xi1 2RdX , . . . , xim[i])> The posterior mean is given as u⇤ i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi ) 1 b1m[i]+ i w . Game Theory Workshop 2017 Uncertainty Awareness
  • 41. Convex Optimization for Model Parameters Likelihood of the entire model is tractable, assuming the choice is given by a logit whose mean utility is the posterior mean u⇤ i . Thus we can fit the function fUC from the choice data. Conveniently, MAP estimation of fUC is convex for fixed K. bb, cw = max b,w nX i=1 `(bHi 1m[i]+Hi i w , yi ) c 2 kw k2 where `(u⇤ i , yi ),log exp(u⇤ iyi ) Pm[i] j0=1exp(u⇤ ij0 ) and Hi ,K(Xi )(Im[i]+K(Xi )) 1 Game Theory Workshop 2017 Uncertainty Awareness
  • 42. Irrationality as Bayesian Shrinkage Implication from the posterior-mean utility in (1) Each option’s utility is shrunk into prior mean 0. Strong shrinkage for an option dissimilar to the others, due to its high posterior variance (=uncertainty). u⇤ i = K(Xi ) Im[i]+K(Xi ) 1 | {z } shrinkage factor b1m[i]+ i w | {z } vec. of utility samples . (1) Context e↵ects as Bayesian uncertainty aversion E.g., RBF kernel K(x, x0 )=exp( kx x0 k2 ) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 FinalEvaluation X1=(5-X2) DCBA {A,B,C} {B,C,D} Game Theory Workshop 2017 Uncertainty Awareness
  • 43. Recovered Context-Dependent Choice Criteria For a speaker dataset: successfully captured mixture of objective preference and subjective context e↵ects. A B C D E Power [Watt] 50 75 100 125 150 Price [USD] 100 130 160 190 220 Choice Set #subjects {A, B, C} 45:135:145 {B, C, D} 58:137:111 {C, D, E} 95:155: 91 2 3 4 100 150 200 Evaluation Price [USD] EDCBA Obj. Eval. {A,B,C} {B,C,D} {C,D,E} -1.1 -1 -0.9 -0.8 AverageLog-Likelihood Dataset PC SP SM LinLogit NpLogit LinMix NpMix GPUA Game Theory Workshop 2017 Uncertainty Awareness
  • 44. A Result of p-beauty Contest by Real Humans Guess 2/3 of all votes (0-100). Mean is apart from the Nash equilibrium 0 (Camerer et al., 2004; Ho et al., 2006). Table: Average Choice in (2/3)-beauty Contests Subject Pool Group Size Sample Size Mean[Yi ] Caltech Board 73 73 49.4 80 year olds 33 33 37.0 High School Students 20-32 52 32.5 Economics PhDs 16 16 27.4 Portfolio Managers 26 26 24.3 Caltech Students 3 24 21.5 Game Theorists 27-54 136 19.1 Game Theory Workshop 2017 Uncertainty Awareness
  • 45. Modeling Bounded Rationality Early stopping at step k: Level-k thinking or Cognitive Hierarchy Theory (Camerer et al., 2004) Humans cannot predict the infinite future. Using non-stationary transitional state Randomization of utility via noise "it: Quantal Response Equilibrium (McKelvey and Palfrey, 1995) 8i 2{1, . . . , n} Y (t) i |Y (t 1) i = arg max Y h fi (Y , Y (t 1) i ) + "it i Both methods essentially work as regularization of rationality. Shrinkage into initial values or uniform choice probabilities Game Theory Workshop 2017 Uncertainty Awareness
  • 46. Linking ML with Game Theory (GT) via Shrinkage Principle Optimization without shrinkage Optimization with shrinkage ML GT Maximum-Likelihood estimation Bayesian estimation Transitional State or Quantal Response Equilibrium Nash Equilibrium Optimal for training data, but less generalization capability to test data Optimal for given game but less predictable to real- world decisions Shrinkage towards uniform probabilities causes suboptimality for the given game, but more predictable to real-world decisions Shrinkage towards prior causes suboptimality for training data, but more generalization capability to test data
  • 47. Early Stopping and Regularization ML as a Dynamical System to find the optimal parameters GT as a Dynamical System to find the equilibrium Parameter #1 Parameter #2 Exact Maximum-likelihood estimate (e.g., OLS) Exact Bayesian estimate shrunk towards zero (e.g., Ridge regression) 0 t=10 t=20 t=30 t=50 An early-stopping estimate (e.g., Partial Least Squares) t=0 t=1 t →∞ t=2 ... mean = 50 mean = 34 mean = 15 mean = 0 Nash Equilibrium Level-2 Transitional State
  • 48. Message ● Bayesian shrinkage ↔ Bounded rationality – Dual-personality model for contextual effects – Towards data-oriented & more realistic games: export ML regularization techniques to GT ● Analyze dynamics or uncertainty-aware equilibria – Early-stopped transitional state, or – QRE with uncertainty on each player's utility function
  • 49. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 50. Additional Implications from ML ● Multiple equilibria or saddle points? ● Equilibria or “typical” transitional states? – Slow convergence – Plateau of objective function
  • 51. Recent history in ML ● Waste of ~20 years for local optimality issue – Neural Networks (NNs) have been criticized for their local optimality in fitting the parameters. – ML community has been sticked with convex optimization approaches (e.g., Support Vector Machines (Vapnik, 1995)). – Most solutions in fitting high-dimensional NNs, however, are found to be not local optima but saddle points (Bray & Dean, 2007; Dauphin+, 2014)! – After skipping saddle points by perturbation, most of the local optima empirically provide similar prediction capabilities. ● Please do not make the same mistake in multi- agent optimization problems (=games)!
  • 52. Why most are saddle points? ● See spectrum of Hessian matrices of a random- drawn non-linear function from a Gaussian process. Local minima: every eigenvalue is positive. Local maxima: every eigenvalue is negative. Univariate Function Saddle point: both positive & negative eigenvalues exist. ● In high-dimensional function, Hessian contains both positive & negative eigenvalues with high probability. Bivariate Function https://en.wikipedia.org/wiki/Saddle_point
  • 53. Open Questions for Multiple Equilibria ● If a game is very complex involving lots of parameters in pay-off or utility functions, then – Are most of its critical points unstable saddle points? – Is number of equilibria much smaller than our guess? ● If we obtain a few equilibria of such complex game, – Do most of such equilibria have similar properties? – Don't we have to obtain other equilibria?
  • 54. See Dynamics: “Typical” Transitional State? ● MLers are sensitive to convergence rate in fitting. – We are in the finite-sample & high-dimensional world: only asymptotics is powerless, and computational estimate is not equilibrium but transitional state. http://sebastianruder.com/optimizing-gradient-descent/ (Kingma & Ba, 2015)
  • 55. See Dynamics: “Typical” Transitional State? ● Mixing time of Markov processes of some games is exponential to the number of players. – E.g., (Axtell+, 2000) equilibrium: equality of wealth transitional states: severe inequality Nash demand game Equilibrium Transitional State ● What If #players is over thousands or millions? – Severe inequality in most of the time
  • 56. See Dynamics: Trapped in Plateau? ● Fitting of a Deep NN is often trapped in plateaus. – Natural gradient descent (Amari, 1997) is often used for quickly escaping from plateau. – In real-world games, are people trapped in plateaus rather than equilibria? https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
  • 57. Conclusion ● Discussed how uncertainty should be incorporated in inductive & deductive decision making. – Quantifying uncertainty or simpler minimal estimation ● Linked Bayesian shrinkage with bounded rationality – Towards data-oriented regularized equilibrium ● Implications from high-dimensional ML – Saddle points, transitional state, and/or plateau
  • 58. THANK YOU FOR ATTENDING! Download this material from https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
  • 59. References References I Amari, S. (1997). Neural learning in structured parameter spaces - natural Riemannian gradient. In Advances in Neural Information Processing Systems 9, pages 127–133. MIT Press. Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes in a multi-agent bargaining model. Working papers, Brookings Institution - Working Papers. Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of gaussian fields on large-dimensional spaces. Physics Review Letters, 98:150201. Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there something quantum-like about the human mental lexicon? Journal of Mathematical Psychology, 53(5):362–377. Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy model of games. Quarterly Journal of Economics, 119:861–898. Game Theory Workshop 2017 Uncertainty Awareness
  • 60. References References II Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in Neural Information Processing Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA. Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice, 2:19–33. Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems 27, pages 2933–2941. Curran Associates, Inc. de Barros, J. A. and Suppes, P. (2009). Quantum mechanics, interference, and the brain. Journal of Mathematical Psychology, 53(5):306–313. Game Theory Workshop 2017 Uncertainty Awareness
  • 61. References References III Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and Allenby, G. M. (2009). A probit model with structured covariance for similarity e↵ects and source of volume calculations. http://ssrn.com/abstract=1396232. Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and context-sensitive model of choice behavior. Psychological Review, 109:137–154. Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research, 5:103–123. Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology of consumer and firm behavior with behavioral economics. Journal of Marketing Research, 43(3):307–331. Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9:90–98. Game Theory Workshop 2017 Uncertainty Awareness
  • 62. References References IV Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, pages 1531–1538. MIT Press. Kingma, D. and Ba, J. (2015). Adam: A method for stochastic optimization. In The International Conference on Learning Representations (ICLR), San Diego. Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models for capturing the compromise e↵ect. Journal of Marketing Research, 41(3):237–257. Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization with gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pages 601–608, New York, NY, USA. ACM. McFadden, D. and Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15:447–470. Game Theory Workshop 2017 Uncertainty Awareness
  • 63. References References V McFadden, D. L. (1980). Econometric models of probabilistic choice among products. Journal of Business, 53(3):13–29. McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10:6–38. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning (ICML 2016), pages 1928–1937. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518:529–533. Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy: A model of the KT (kahnemantversky)-man. Journal of Mathematical Psychology, 53(5):349–361. Game Theory Workshop 2017 Uncertainty Awareness
  • 64. References References VI Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001). Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review, 108:370–392. Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects in preference choice: What makes for a bargain? In Proceedings of the Cognitive Science Society Conference. Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489. Simonson, I. (1989). Choice based on reasons: The case of attraction and compromise e↵ects. Journal of Consumer Research, 16:158–174. Game Theory Workshop 2017 Uncertainty Awareness
  • 65. References References VII Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063. MIT Press. Takahashi, R. and Morimura, T. (2015). Predicting preference reversals via gaussian process uncertainty aversion. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015), pages 958–967. Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator model of context e↵ects in multialternative choice. Psychological Review, 121(2):179–205. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79:281–299. Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in dynamical models of multialternative choice. Psychological Review, 111:757–769. Game Theory Workshop 2017 Uncertainty Awareness
  • 66. References References VIII Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit model. Transportation Research Part B, 35:627–641. Williams, H. (1977). On the formulation of travel demand models and economic evaluation measures of user benefit. Environment and Planning A, 9(3):285–344. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. 8(3):229–256. Yai, T. (1997). Multinomial probit with structured covariance for route choice behavior. Transportation Research Part B: Methodological, 31(3):195–207. Game Theory Workshop 2017 Uncertainty Awareness