SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
An Introduction to
Reinforcement Learning
Jie-Han Chen
NetDB, National Cheng Kung University
3/27, 2018 @ National Cheng Kung University, Taiwan
1
The content in this lecture were borrowed from:
1. Rich Sutton’s textbook
2. David Silver’s Reinforcement Learning class in UCL
3. Sergey Levine’s Deep Reinforcement Learning class in UCB
2
Disclamier
Syllabus
● Introduction to Reinforcement Learning
● Markov Decision Process
● Dynamic Programming
● Monte Carlo method
● Temporal Difference method
● Deep Reinforcement Learning
● Policy Gradient
● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning
● Active Research Issue
3
Resources
Textbooks:
● Reinforcement Learning: An Introduction, Sutton and Barto
● Algorithms for Reinforcement Learning, Szepesvari
Course:
● CS 294 Deep Reinforcement Learning, Berkeley
● David Silver’s Reinforcement Learning course, UCL
● CMU 10703 Deep Reinforcement Learning and Control, CMU
● Shan-Hung Wu’s Deep Learning course in NTHU
All of them are our reference materials in this lecture.
4
Outline
● Syllabus
● Introduction
● Elements of reinforcement learning and its objective
● History of RL
● Applications
● The challenge and active research fields in RL
● Research institute and notable researchers
5
Machine Learning
From David Silver’s RL course 6
Introduction to Reinforcement Learning
Reinforcement learning is a learning framework different from supervised learning
and unsupervised learning.
It is composed of series of perception and interaction between agent and
environment.
From Sutton’s book 7
Agent and Environment
At each step t the agent:
● Receives scalar reward Rt
● Receives observaiotn Ot
● Executes action At
The environment:
● Receives action At
● Emits observation Ot+1
● Emits scalar reward Rt+1
8
Introduction to Reinforcement Learning
Reinforcement Learning is often used to solve sequential decision problem.
● Goal: select actions to maximize total future reward
● Action may have long term consequences
● Reward may be delayed
● It may be better to sacrifice immediate reward to gain more long-term reward
● Eg:
○ A financial investiment
○ Chess game
9
Supervised Learning & Unsupervised Learning
The input data are independent (i.i.d).
Current output will not affect the next
input.
10
Reinforcement Learning
The agent’s action do affect the data
received in the future.
Figure from Wikipedia, made by waldoalvarez11
Introduction to Reinforcement Learning
● In reinforcement learning the
agent learns from trial and error.
● The better experience make the
agent learn better policy.
● What kind of experience is
better?
The image is from :
http://www.homemeeting.us/franktmc/maze_2.jpg
12
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
13
Elements of reinforcement learning - policy
Policy
● Define the learning agents’ way of behaving at a given time. Could be a
simple function or lookup table or search process
● Often denoted by
● Could be deterministic or stochastic
14
Elements of reinforcement learning - policy
If you are Russell Westbrook, and now
is defended by James Harden. With
this situation, you have 3 choices:
● Cut
● Shoot
● Pass
15
Stochastic policy
Probability
Action
16
Deterministic policy
Probability
Action
17
Policies - Action space
In reinforcement learning, we can categorize the problem by the action space into
2 types.
● Discrete action space
● Continuous action space
In previous example, the decision or the action are in discrete space, but there are
many example of continuous control, eg: robotic arm. The stochastic policy of
continuous control problem would like a probability density function.
18
Elements of reinforcement learning - reward
Reward: r / Rt
● Defines the goal in a reinforcement learning problem
● Indicates how well agent is doing at step t
● Immediately percepted from the environment
19
Elements of reinforcement learning - reward
+2
0 or -0.2?
20
Elements of reinforcement learning - reward
In chess or Go, the reward is defined
by its outcome.
● Win: +1
● Draw: 0
● Lose: -1
In most steps, we don’t receive any
reward(value = 0). It’s a kind of sparse
reward problem.
21
Elements of reinforcement learning - reward
If we want to reach the goal by less
steps, we often define the reward to
-1 when you take a step.
22
Elements of reinforcement learning - value function
Value function
● Indicates which decision is good in the long run.
● There are two forms:
○ state-value function
○ action-value function
● Unlike reward, value function is an estmated value.
23
Elements of reinforcement learning - value function
The game comes to 99 vs 98(our) and just
left 5 seconds to the end of the game.
Now, If you need to throw in in midfield,
which one would you pass the ball to?
1. 櫻木花道
2. 三井壽
24
Elements of reinforcement learning - model
Model of environments (optional)
● Use something to mimic the behavior of the environment.
● Allow inferences to be made about how the environment will behave.
(planning)
● Methods for solving reinforcement learning problems that use models for
planning are called model-based methods. The opposites are model-free
methods.
25
Elements of reinforcement learning - model
Interaction, inferences
Learn the model
The image is from David Silver’s RL course 26
Just like ...
27
Elements of reinforcement learning - model
28
Elements of reinforcement learning - model
29
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
30
The objective of reinforcement learning
Reinforcement learning is a framework
of goal directed learning.
The objective of reinforcement learning
is to maximize accumulative rewards in
each task.
The image is from:
https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
History of Reinforcement Learning
Reinforcement Learning is inspired by two domain knowledge
● Optimal control
● Biological learning system: Animal learning
32
Optimal control
It is a mathematical optimization method for deriving control policies
especially under certain constraints.
The optimization method is largely due to the work of Lev Pontryagin and
Richard Bellman in the 1950s.
33
Richard Bellman
Richard Bellman was an applied
mathematician, who introduced dynamic
programming in 1953.
Work:
● Bellman Equation
● Curse of dimensionality
● Bellman-Ford algorithm
34
Animal Learning
● Teach dog - positive reward
35
Animal Learning
● Teach dog - penalty (negative reward)
36
Some question about RL
● Why do we need to learn Reinforcement Learning?
● What make Reinforcement Learning spring up like mushrooms?
37
Backgammon (IBM, 1992)
Temporal difference learning and TD-Gammon, by
Gerald Tesauro, 1992
Gammon is 雙陸棋 in Chinese.
source: from wikipedia
38
Autonomous Helicopter (Stanford, 2000)
The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and
Pieter Abbeel in Stanford.
You can see more details on: http://heli.stanford.edu/39
Deep reinforcement learning in Atari game (2013)
Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning
end-to-end model to combine deep learning with raw inputs.
40
Deep reinforcement learning in Atari game (2013)
41
Deep Reinforcement Learning for Robotic Manipulation
42
AlphaGo (DeepMind, 2016)
43
AlphaGo (DeepMind, 2016)
AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and
deep reinforcement learning (policy gradient) to master the game of Go.
44
AlphaGo Zero (DeepMind, 2017)
AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with
2-head architecture to learn from scratch without human knowledge.
45
46
AlphaGo Zero (DeepMind, 2017)
Dota2 (OpenAI, 2017)
● Beats the world’s top professionals at 1v1 matches
● The bot learned from scratch by self-play
47
Dota2 (OpenAI, 2017)
48
Dota2 (OpenAI, 2017)
49
Alibaba (Starcraft1, multiagent)
50
Deep RL for Dialogue Generation (Li et al., 2016)
● RL agent generates more interactive responses
● RL agent tends to end a sentence with a question and hand the conversation
over to the user
● Next step: explore intrinsic rewards, large-scale training
From the slides on http://opendialogue.miulab.tw51
The Challenge of reinforcement learning
● Sparse reward issue
● Reward credit assignment
● Large space for exploration (trial-and-error)
● Imperfect information, partial observation
52
Active research domain
● Multiagent reinforcement learning
● Hierarchical reinforcement learning
● Inverse reinforcement learning
● Multi-task Transfer learning in reinforcement learning
● Meta learning
● One-shot reinforcement learning
● Deep reinforcement learning in dialogue generation
53
Research institute and notable researchers
54
The research scientists in RL you must know!
● Richard S. Sutton
● David Silver
● Pieter Abbeel
● Sergey Levine
55
Richard S. Sutton
● The founding father of reinforcement
learning
● Professor of Computer Science at University
of Alberta
● Temporal difference learning
● Dyna architecture
56
David Silver
● The research scientist in DeepMind
● Lead researcher on AlphaGo and AlphaGo
Zero team
● Supervised by Sutton in Ph.D
● A professor in University College London
before
57
Pieter Abbeel
● Professor in UC Berkeley
● Director of the UC Berkeley Robot Learning Lab
● Research scientist and advisor in OpenAI
58
Sergey Levine
● Assistant Professor in UC Berkeley
● Research scientist in Google Brain
● Autonomous robots
59
Question?
60

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep Q-learning explained
Deep Q-learning explainedDeep Q-learning explained
Deep Q-learning explained
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 

Similar a An introduction to reinforcement learning

acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
butest
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
SeriousGamesAssoc
 

Similar a An introduction to reinforcement learning (20)

Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
Teacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningTeacher-Aware Active Robot Learning
Teacher-Aware Active Robot Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Reinforcement learning in a nutshell
Reinforcement learning in a nutshellReinforcement learning in a nutshell
Reinforcement learning in a nutshell
 
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Machine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brainMachine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brain
 
Unit5: Learning
Unit5: LearningUnit5: Learning
Unit5: Learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learning
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
 

Más de Jie-Han Chen

Más de Jie-Han Chen (8)

Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Deep reinforcement learning
Deep reinforcement learningDeep reinforcement learning
Deep reinforcement learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
The artofreadablecode
The artofreadablecodeThe artofreadablecode
The artofreadablecode
 

Último

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Último (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

An introduction to reinforcement learning

  • 1. An Introduction to Reinforcement Learning Jie-Han Chen NetDB, National Cheng Kung University 3/27, 2018 @ National Cheng Kung University, Taiwan 1
  • 2. The content in this lecture were borrowed from: 1. Rich Sutton’s textbook 2. David Silver’s Reinforcement Learning class in UCL 3. Sergey Levine’s Deep Reinforcement Learning class in UCB 2 Disclamier
  • 3. Syllabus ● Introduction to Reinforcement Learning ● Markov Decision Process ● Dynamic Programming ● Monte Carlo method ● Temporal Difference method ● Deep Reinforcement Learning ● Policy Gradient ● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning ● Active Research Issue 3
  • 4. Resources Textbooks: ● Reinforcement Learning: An Introduction, Sutton and Barto ● Algorithms for Reinforcement Learning, Szepesvari Course: ● CS 294 Deep Reinforcement Learning, Berkeley ● David Silver’s Reinforcement Learning course, UCL ● CMU 10703 Deep Reinforcement Learning and Control, CMU ● Shan-Hung Wu’s Deep Learning course in NTHU All of them are our reference materials in this lecture. 4
  • 5. Outline ● Syllabus ● Introduction ● Elements of reinforcement learning and its objective ● History of RL ● Applications ● The challenge and active research fields in RL ● Research institute and notable researchers 5
  • 6. Machine Learning From David Silver’s RL course 6
  • 7. Introduction to Reinforcement Learning Reinforcement learning is a learning framework different from supervised learning and unsupervised learning. It is composed of series of perception and interaction between agent and environment. From Sutton’s book 7
  • 8. Agent and Environment At each step t the agent: ● Receives scalar reward Rt ● Receives observaiotn Ot ● Executes action At The environment: ● Receives action At ● Emits observation Ot+1 ● Emits scalar reward Rt+1 8
  • 9. Introduction to Reinforcement Learning Reinforcement Learning is often used to solve sequential decision problem. ● Goal: select actions to maximize total future reward ● Action may have long term consequences ● Reward may be delayed ● It may be better to sacrifice immediate reward to gain more long-term reward ● Eg: ○ A financial investiment ○ Chess game 9
  • 10. Supervised Learning & Unsupervised Learning The input data are independent (i.i.d). Current output will not affect the next input. 10
  • 11. Reinforcement Learning The agent’s action do affect the data received in the future. Figure from Wikipedia, made by waldoalvarez11
  • 12. Introduction to Reinforcement Learning ● In reinforcement learning the agent learns from trial and error. ● The better experience make the agent learn better policy. ● What kind of experience is better? The image is from : http://www.homemeeting.us/franktmc/maze_2.jpg 12
  • 13. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 13
  • 14. Elements of reinforcement learning - policy Policy ● Define the learning agents’ way of behaving at a given time. Could be a simple function or lookup table or search process ● Often denoted by ● Could be deterministic or stochastic 14
  • 15. Elements of reinforcement learning - policy If you are Russell Westbrook, and now is defended by James Harden. With this situation, you have 3 choices: ● Cut ● Shoot ● Pass 15
  • 18. Policies - Action space In reinforcement learning, we can categorize the problem by the action space into 2 types. ● Discrete action space ● Continuous action space In previous example, the decision or the action are in discrete space, but there are many example of continuous control, eg: robotic arm. The stochastic policy of continuous control problem would like a probability density function. 18
  • 19. Elements of reinforcement learning - reward Reward: r / Rt ● Defines the goal in a reinforcement learning problem ● Indicates how well agent is doing at step t ● Immediately percepted from the environment 19
  • 20. Elements of reinforcement learning - reward +2 0 or -0.2? 20
  • 21. Elements of reinforcement learning - reward In chess or Go, the reward is defined by its outcome. ● Win: +1 ● Draw: 0 ● Lose: -1 In most steps, we don’t receive any reward(value = 0). It’s a kind of sparse reward problem. 21
  • 22. Elements of reinforcement learning - reward If we want to reach the goal by less steps, we often define the reward to -1 when you take a step. 22
  • 23. Elements of reinforcement learning - value function Value function ● Indicates which decision is good in the long run. ● There are two forms: ○ state-value function ○ action-value function ● Unlike reward, value function is an estmated value. 23
  • 24. Elements of reinforcement learning - value function The game comes to 99 vs 98(our) and just left 5 seconds to the end of the game. Now, If you need to throw in in midfield, which one would you pass the ball to? 1. 櫻木花道 2. 三井壽 24
  • 25. Elements of reinforcement learning - model Model of environments (optional) ● Use something to mimic the behavior of the environment. ● Allow inferences to be made about how the environment will behave. (planning) ● Methods for solving reinforcement learning problems that use models for planning are called model-based methods. The opposites are model-free methods. 25
  • 26. Elements of reinforcement learning - model Interaction, inferences Learn the model The image is from David Silver’s RL course 26
  • 28. Elements of reinforcement learning - model 28
  • 29. Elements of reinforcement learning - model 29
  • 30. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 30
  • 31. The objective of reinforcement learning Reinforcement learning is a framework of goal directed learning. The objective of reinforcement learning is to maximize accumulative rewards in each task. The image is from: https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
  • 32. History of Reinforcement Learning Reinforcement Learning is inspired by two domain knowledge ● Optimal control ● Biological learning system: Animal learning 32
  • 33. Optimal control It is a mathematical optimization method for deriving control policies especially under certain constraints. The optimization method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s. 33
  • 34. Richard Bellman Richard Bellman was an applied mathematician, who introduced dynamic programming in 1953. Work: ● Bellman Equation ● Curse of dimensionality ● Bellman-Ford algorithm 34
  • 35. Animal Learning ● Teach dog - positive reward 35
  • 36. Animal Learning ● Teach dog - penalty (negative reward) 36
  • 37. Some question about RL ● Why do we need to learn Reinforcement Learning? ● What make Reinforcement Learning spring up like mushrooms? 37
  • 38. Backgammon (IBM, 1992) Temporal difference learning and TD-Gammon, by Gerald Tesauro, 1992 Gammon is 雙陸棋 in Chinese. source: from wikipedia 38
  • 39. Autonomous Helicopter (Stanford, 2000) The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and Pieter Abbeel in Stanford. You can see more details on: http://heli.stanford.edu/39
  • 40. Deep reinforcement learning in Atari game (2013) Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning end-to-end model to combine deep learning with raw inputs. 40
  • 41. Deep reinforcement learning in Atari game (2013) 41
  • 42. Deep Reinforcement Learning for Robotic Manipulation 42
  • 44. AlphaGo (DeepMind, 2016) AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and deep reinforcement learning (policy gradient) to master the game of Go. 44
  • 45. AlphaGo Zero (DeepMind, 2017) AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with 2-head architecture to learn from scratch without human knowledge. 45
  • 47. Dota2 (OpenAI, 2017) ● Beats the world’s top professionals at 1v1 matches ● The bot learned from scratch by self-play 47
  • 51. Deep RL for Dialogue Generation (Li et al., 2016) ● RL agent generates more interactive responses ● RL agent tends to end a sentence with a question and hand the conversation over to the user ● Next step: explore intrinsic rewards, large-scale training From the slides on http://opendialogue.miulab.tw51
  • 52. The Challenge of reinforcement learning ● Sparse reward issue ● Reward credit assignment ● Large space for exploration (trial-and-error) ● Imperfect information, partial observation 52
  • 53. Active research domain ● Multiagent reinforcement learning ● Hierarchical reinforcement learning ● Inverse reinforcement learning ● Multi-task Transfer learning in reinforcement learning ● Meta learning ● One-shot reinforcement learning ● Deep reinforcement learning in dialogue generation 53
  • 54. Research institute and notable researchers 54
  • 55. The research scientists in RL you must know! ● Richard S. Sutton ● David Silver ● Pieter Abbeel ● Sergey Levine 55
  • 56. Richard S. Sutton ● The founding father of reinforcement learning ● Professor of Computer Science at University of Alberta ● Temporal difference learning ● Dyna architecture 56
  • 57. David Silver ● The research scientist in DeepMind ● Lead researcher on AlphaGo and AlphaGo Zero team ● Supervised by Sutton in Ph.D ● A professor in University College London before 57
  • 58. Pieter Abbeel ● Professor in UC Berkeley ● Director of the UC Berkeley Robot Learning Lab ● Research scientist and advisor in OpenAI 58
  • 59. Sergey Levine ● Assistant Professor in UC Berkeley ● Research scientist in Google Brain ● Autonomous robots 59