SlideShare a Scribd company logo
1 of 54
Download to read offline
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
3/2/2016
How	
  it	
  works
AlphaGo vs	
  European	
  Champion	
  (Fan	
  Hui 2-­‐Dan)
October	
  5	
  – 9,	
  2015
<Official	
  match>
-­‐ Time	
  limit:	
  1	
  hour
-­‐ AlphaGo Wins (5:0)
*
rank
AlphaGo vs	
  World	
  Champion	
  (Lee	
  Sedol 9-­‐Dan)
March	
  9	
  – 15,	
  2016
<Official	
  match>
-­‐ Time	
  limit:	
  2	
  hours
Venue:	
  Seoul,	
  Four	
  Seasons	
  Hotel
Image	
  Source: Josun	
  Times Jan	
  28th
2015
Lee	
  Sedol
Photo	
  source: Maeil	
  Economics 2013/04
wiki
Computer	
  Go	
  AI?
Computer	
  Go	
  AI – Definition
s (state)
d	
  =	
  1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
=
(e.g.	
  we	
  can	
  represent	
  the	
  board	
  into	
  a	
  matrix-­‐like	
  form)
*	
  The	
  actual	
  model	
  uses	
  other	
  features	
  than	
  board	
  positions	
  as	
  well
Computer	
  Go	
  AI	
  – Definition
s (state)
d	
  =	
  1 d	
  =	
  2
a (action)
Given	
  s,	
  pick	
  the	
  best	
  a
Computer	
  Go
Artificial	
  
Intelligence
s a s'
Computer	
  Go	
  AI – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
How	
  about	
  simulating	
  all	
  possible	
  board	
  positions?
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
e.g. it	
  wins	
  13	
  times	
  if	
  the	
  next	
  stone	
  gets	
  placed	
  here
37,839	
  times
431,320	
  times
Choose	
  the	
  “next	
  action	
  /	
  stone”
that	
  has	
  the	
  most	
  win-­‐counts	
  in	
  the	
  full-­‐scale	
  simulation
This	
  is	
  NOT	
  possible;	
  it	
  is	
  said	
  the	
  possible	
  configurations	
  of	
  the	
  board	
  exceeds	
  the	
  number	
   of	
  atoms	
  in	
  the	
  universe
Key: To	
  Reduce Search	
  Space
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
… d	
  =	
  maxD
Win?
Loss?
IF	
  there	
  is	
  a	
  model	
  that	
  can	
  tell	
  you	
  that	
  these	
  moves
are	
  not	
  common	
  /	
  probable	
  (e.g.	
  by	
  experts,	
  etc.)	
  …
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Remove	
  these	
  from	
  search	
  candidates	
  in	
  advance (breadth	
  reduction)
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Instead	
  of	
  simulating	
  until	
  the	
  maximum	
  depth ..
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
V	
  =	
  1
V	
  =	
  2
V	
  =	
  10
IF	
  there	
  is	
  a	
  function	
  that	
  can	
  measure:
V(s):	
  “board	
  evaluation	
  of	
  state	
  s”
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
2. Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
1.	
  Reducing	
  “action	
  candidates”
Learning:	
  P	
  (	
  next	
  action	
  |	
  current	
  state	
  )
=	
  P	
  (	
  a	
  |	
  s	
  )
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Current	
  State
Prediction	
  
Model
Next	
  State
s1 s2
s2 s3
s3 s4
Data:	
  Online	
  Go experts (5~9	
  dan)
160K games, 30M	
  board	
  positions
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Board
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Action
There	
  are	
  19	
  X	
  19	
  =	
  361
possible	
  actions
(with	
  different	
  probabilities)
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s af:	
  s à a
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s) p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Deep	
  Learning
(13	
  Layer	
  CNN)
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
Convolutional	
  Neural	
  Network	
  (CNN)
CNN	
  is	
  a	
  powerful	
  model	
  for	
  image	
  recognition	
  tasks;	
  it	
  abstracts	
  out	
  the	
  input	
  image	
  through	
  convolution	
  layers
Image	
  source
Convolutional	
  Neural	
  Network	
  (CNN)
And	
  they	
  use	
  this	
  CNN	
  model	
  (similar	
  architecture)	
  to	
  evaluate	
  the	
  board	
  position;	
  which learns	
  “some”	
  spatial	
  invariance
Go: abstraction	
  is	
  the	
  key	
  to	
  win
CNN:	
  abstraction	
  is	
  its	
  forte
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Current	
  Board Next	
  Action
Training:
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Improving	
  by	
  playing	
  against	
  itself
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Board	
  position win/loss
Training:
Loss
z	
  =	
  -­‐1
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Training:
z	
  =	
  +1
Board	
  position win/loss
Win
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model
ver 1.1
Updated	
  Model
ver 1.3VS
Return:	
  board	
  positions, win/lose info
It	
  uses	
  the	
  same	
  topology	
  as	
  the	
  expert	
  moves	
  imitator	
  model,	
  and	
  just	
  uses	
  the	
  updated parameters
Older	
  models	
  vs.	
  newer	
  models
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.3
Updated	
  Model	
  
ver 1.7VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.5
Updated	
  Model	
  
ver 2.0VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 3204.1
Updated	
  Model	
  
ver 46235.2VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1,000,000VS
The	
  final	
  model	
  wins 80%	
  of	
  the time
when	
  playing	
  against	
  the	
  first	
  model
Expert	
  Moves	
  
Imitator	
  Model
2.	
  Board	
  Evaluation
2.	
  Board	
  Evaluation
Updated	
  Model
ver 1,000,000
Board	
  Position
Training:
Win	
  /	
  Loss
Win
(0~1)
Value	
  
Prediction	
  
Model
(Regression)
Adds	
  a regression	
  layer	
  to	
  the	
  model
Predicts	
  values	
  between	
  0~1
Close	
  to	
  1:	
  a	
  good	
  board	
  position
Close	
  to	
  0:	
  a	
  bad	
  board	
  position
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”
(Breadth	
  Reduction)
2. Board	
  Evaluation (Depth	
  Reduction)
Policy	
  Network
Value	
  Network
Looking	
  ahead	
  (w/	
  Monte	
  Carlo	
  Search	
  Tree)
Action	
  Candidates	
  Reduction
(Policy	
  Network)
Board	
  Evaluation
(Value	
  Network)
(Rollout):	
  Faster	
  version	
  of	
  estimating	
  p(a|s)
à uses shallow	
  networks	
  (3	
  ms à 2µs)
Results
Elo rating	
  system
Performance	
  with	
  different	
  combinations	
  of	
  AlphaGo components
Takeaways
Use	
  the	
  networks	
  trained	
  for	
  a	
  certain	
  task	
  (with	
  different	
  loss	
  objectives)	
  for	
  several	
  other	
  tasks
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Energy	
  Consumption
Lee	
  Sedol AlphaGo
-­‐ Recommended calories	
  for	
  a man per	
  day
: ~2,500 kCal
-­‐ Assumption: Lee consumes	
  the	
  entire	
  amount	
  of	
  
per-­‐day calories	
  in	
  this	
  one	
  game
2,500	
  kCal *	
  4,184	
  J/kCal
~=	
  10M	
  [J]
-­‐ Assumption: CPU:	
  ~100	
  W,	
  GPU:	
  ~300 W
-­‐ 1,202 CPUs, 176 GPUs
170,000	
  J/sec	
  *	
  5	
  hr *	
  3,600	
  sec/hr
~=	
  3,000M	
  [J]
A	
  very,	
  very	
  rough	
  calculation	
  ;)
AlphaGo is	
  estimated	
  to	
  be	
  around	
  ~5-­‐dan
=	
  multiple	
  machines European	
  champion
Taking	
  CPU	
  /	
  GPU resources	
  to	
  virtually	
  infinity?
But	
  Google	
  has	
  promised	
  not	
  to	
  use	
  more	
  CPU/GPUs
than	
  they	
  used	
  for	
  Fan	
  Hui	
  for	
  the	
  game	
  with	
  Lee
No	
  one	
  knows
how it	
  will	
  converge
AlphaGo learns	
  millions	
  of	
  Go	
  games	
  every	
  day
AlphaGo will	
  presumably	
  converge	
  to	
  some	
  point	
  eventually.
However,	
  in	
  the	
  Nature	
  paper	
  they	
  don’t	
  report	
  how	
  AlphaGo’s performance	
  improves
as	
  a	
  function	
  of	
  times	
  AlphaGo plays	
  against	
  itself	
  (self-­‐plays).
What	
  if	
  AlphaGo learns	
  Lee’s	
  game	
  strategy
Google	
  said	
  they	
  won’t	
  use	
  Lee’s	
  game	
  plays	
  as	
  AlphaGo’s training	
  data	
  
Even	
  if	
  it	
  does,	
  it	
  won’t	
  be	
  easy	
  to	
  modify	
  the	
  model	
  trained	
  over	
  millions	
  of
data	
  points	
  with	
  just	
  a	
  few	
  game	
  plays	
  with	
  Lee
(prone	
  to	
  over-­‐fitting,	
  etc.)
AlphaGo’s Weakness?
AlphaGo – How	
  It	
  Works
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
me@shanemoon.com
3/2/2016
Reference
• Silver,	
  David,	
  et	
  al.	
  "Mastering	
  the	
  game	
  of	
  Go	
  with	
  deep	
  neural	
  
networks	
  and	
  tree	
  search." Nature 529.7587	
  (2016):	
  484-­‐489.

More Related Content

What's hot

AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약Jooyoul Lee
 
Cuckoo Search Algorithm - Beyazıt Kölemen
Cuckoo Search Algorithm - Beyazıt KölemenCuckoo Search Algorithm - Beyazıt Kölemen
Cuckoo Search Algorithm - Beyazıt KölemenBeyazıt Kölemen
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
알파고의 알고리즘
알파고의 알고리즘알파고의 알고리즘
알파고의 알고리즘SeokWon Kim
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기이 의령
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Xin-She Yang
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
알파고 해부하기 1부
알파고 해부하기 1부알파고 해부하기 1부
알파고 해부하기 1부Donghun Lee
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 

What's hot (20)

AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
Cuckoo Search Algorithm - Beyazıt Kölemen
Cuckoo Search Algorithm - Beyazıt KölemenCuckoo Search Algorithm - Beyazıt Kölemen
Cuckoo Search Algorithm - Beyazıt Kölemen
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
알파고의 알고리즘
알파고의 알고리즘알파고의 알고리즘
알파고의 알고리즘
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
알파고 해부하기 1부
알파고 해부하기 1부알파고 해부하기 1부
알파고 해부하기 1부
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Regularization
RegularizationRegularization
Regularization
 
Ga
GaGa
Ga
 

Similar to How AlphaGo Works

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streamsjohofer
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxDrYogeshDeshmukh1
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfMohammad Shaker
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolJun Hu
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsSC5.io
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안KyuYeolJung
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Vasco Duarte
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게NHN FORWARD
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-TransformationsMohammad Shaker
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningJui-Hsin (Larry) Lai
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous DrivingKiho Suh
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)Amro Elfeki
 

Similar to How AlphaGo Works (20)

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streams
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptx
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-Transformations
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

How AlphaGo Works

  • 1. Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University 3/2/2016 How  it  works
  • 2. AlphaGo vs  European  Champion  (Fan  Hui 2-­‐Dan) October  5  – 9,  2015 <Official  match> -­‐ Time  limit:  1  hour -­‐ AlphaGo Wins (5:0) * rank
  • 3. AlphaGo vs  World  Champion  (Lee  Sedol 9-­‐Dan) March  9  – 15,  2016 <Official  match> -­‐ Time  limit:  2  hours Venue:  Seoul,  Four  Seasons  Hotel Image  Source: Josun  Times Jan  28th 2015
  • 4. Lee  Sedol Photo  source: Maeil  Economics 2013/04 wiki
  • 6. Computer  Go  AI – Definition s (state) d  =  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = (e.g.  we  can  represent  the  board  into  a  matrix-­‐like  form) *  The  actual  model  uses  other  features  than  board  positions  as  well
  • 7. Computer  Go  AI  – Definition s (state) d  =  1 d  =  2 a (action) Given  s,  pick  the  best  a Computer  Go Artificial   Intelligence s a s'
  • 8. Computer  Go  AI – An  Implementation  Idea? d  =  1 d  =  2 … How  about  simulating  all  possible  board  positions?
  • 9. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … …
  • 10. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results
  • 11. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results e.g. it  wins  13  times  if  the  next  stone  gets  placed  here 37,839  times 431,320  times Choose  the  “next  action  /  stone” that  has  the  most  win-­‐counts  in  the  full-­‐scale  simulation
  • 12. This  is  NOT  possible;  it  is  said  the  possible  configurations  of  the  board  exceeds  the  number   of  atoms  in  the  universe
  • 13. Key: To  Reduce Search  Space
  • 14. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … … … d  =  maxD Win? Loss? IF  there  is  a  model  that  can  tell  you  that  these  moves are  not  common  /  probable  (e.g.  by  experts,  etc.)  …
  • 15. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Remove  these  from  search  candidates  in  advance (breadth  reduction)
  • 16. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Instead  of  simulating  until  the  maximum  depth ..
  • 17. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … V  =  1 V  =  2 V  =  10 IF  there  is  a  function  that  can  measure: V(s):  “board  evaluation  of  state  s”
  • 18. Reducing  Search  Space 1. Reducing  “action  candidates”  (Breadth  Reduction) 2. Position  evaluation  ahead  of  time  (Depth  Reduction)
  • 19. 1.  Reducing  “action  candidates” Learning:  P  (  next  action  |  current  state  ) =  P  (  a  |  s  )
  • 20. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Current  State Prediction   Model Next  State s1 s2 s2 s3 s3 s4 Data:  Online  Go experts (5~9  dan) 160K games, 30M  board  positions
  • 21. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Board
  • 22. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Action There  are  19  X  19  =  361 possible  actions (with  different  probabilities)
  • 23. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s af:  s à a Current  Board Next  Action
  • 24. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 25. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) p(a|s) aargmax Current  Board Next  Action
  • 26. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Deep  Learning (13  Layer  CNN) 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 27. Convolutional  Neural  Network  (CNN) CNN  is  a  powerful  model  for  image  recognition  tasks;  it  abstracts  out  the  input  image  through  convolution  layers Image  source
  • 28. Convolutional  Neural  Network  (CNN) And  they  use  this  CNN  model  (similar  architecture)  to  evaluate  the  board  position;  which learns  “some”  spatial  invariance
  • 29. Go: abstraction  is  the  key  to  win CNN:  abstraction  is  its  forte
  • 30. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Expert  Moves  Imitator  Model (w/  CNN) Current  Board Next  Action Training:
  • 31. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Improving  by  playing  against  itself
  • 32. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Return:  board  positions, win/lose info
  • 33. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Board  position win/loss Training: Loss z  =  -­‐1
  • 34. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Training: z  =  +1 Board  position win/loss Win
  • 35. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model ver 1.1 Updated  Model ver 1.3VS Return:  board  positions, win/lose info It  uses  the  same  topology  as  the  expert  moves  imitator  model,  and  just  uses  the  updated parameters Older  models  vs.  newer  models
  • 36. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.3 Updated  Model   ver 1.7VS Return:  board  positions, win/lose info
  • 37. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.5 Updated  Model   ver 2.0VS Return:  board  positions, win/lose info
  • 38. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 3204.1 Updated  Model   ver 46235.2VS Return:  board  positions, win/lose info
  • 39. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1,000,000VS The  final  model  wins 80%  of  the time when  playing  against  the  first  model Expert  Moves   Imitator  Model
  • 41. 2.  Board  Evaluation Updated  Model ver 1,000,000 Board  Position Training: Win  /  Loss Win (0~1) Value   Prediction   Model (Regression) Adds  a regression  layer  to  the  model Predicts  values  between  0~1 Close  to  1:  a  good  board  position Close  to  0:  a  bad  board  position
  • 42. Reducing  Search  Space 1. Reducing  “action  candidates” (Breadth  Reduction) 2. Board  Evaluation (Depth  Reduction) Policy  Network Value  Network
  • 43. Looking  ahead  (w/  Monte  Carlo  Search  Tree) Action  Candidates  Reduction (Policy  Network) Board  Evaluation (Value  Network) (Rollout):  Faster  version  of  estimating  p(a|s) à uses shallow  networks  (3  ms à 2µs)
  • 44. Results Elo rating  system Performance  with  different  combinations  of  AlphaGo components
  • 45. Takeaways Use  the  networks  trained  for  a  certain  task  (with  different  loss  objectives)  for  several  other  tasks
  • 46. Lee  Sedol 9-­‐dan vs  AlphaGo
  • 47. Lee  Sedol 9-­‐dan vs  AlphaGo Energy  Consumption Lee  Sedol AlphaGo -­‐ Recommended calories  for  a man per  day : ~2,500 kCal -­‐ Assumption: Lee consumes  the  entire  amount  of   per-­‐day calories  in  this  one  game 2,500  kCal *  4,184  J/kCal ~=  10M  [J] -­‐ Assumption: CPU:  ~100  W,  GPU:  ~300 W -­‐ 1,202 CPUs, 176 GPUs 170,000  J/sec  *  5  hr *  3,600  sec/hr ~=  3,000M  [J] A  very,  very  rough  calculation  ;)
  • 48. AlphaGo is  estimated  to  be  around  ~5-­‐dan =  multiple  machines European  champion
  • 49. Taking  CPU  /  GPU resources  to  virtually  infinity? But  Google  has  promised  not  to  use  more  CPU/GPUs than  they  used  for  Fan  Hui  for  the  game  with  Lee No  one  knows how it  will  converge
  • 50. AlphaGo learns  millions  of  Go  games  every  day AlphaGo will  presumably  converge  to  some  point  eventually. However,  in  the  Nature  paper  they  don’t  report  how  AlphaGo’s performance  improves as  a  function  of  times  AlphaGo plays  against  itself  (self-­‐plays).
  • 51. What  if  AlphaGo learns  Lee’s  game  strategy Google  said  they  won’t  use  Lee’s  game  plays  as  AlphaGo’s training  data   Even  if  it  does,  it  won’t  be  easy  to  modify  the  model  trained  over  millions  of data  points  with  just  a  few  game  plays  with  Lee (prone  to  over-­‐fitting,  etc.)
  • 53. AlphaGo – How  It  Works Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University me@shanemoon.com 3/2/2016
  • 54. Reference • Silver,  David,  et  al.  "Mastering  the  game  of  Go  with  deep  neural   networks  and  tree  search." Nature 529.7587  (2016):  484-­‐489.