SlideShare una empresa de Scribd logo
1 de 33
REINFORCEMENT
LEARNING
AbdalmuGhith Alzbibi
Ahmad Ataya
Mhd Salem Kabbani
Nadir Pervez
Outline
■ Introduction
■ Element of reinforcement learning
■ Q-learning
■ Deep Q-Network
■ Demo
■ References
2
3
Introduction
 Supervised learning : a situation in which
sample (input, output) pairs of the function to be
learned can be perceived or are given
 Unsupervised learning : Data Driven (Clustering)
 Reinforcement learning —
 Close to human learning.
 Algorithm learns a policy of how to act in a given
environment.
 Every action has some effect in the environment,
and the environment provides rewards that guides
the learning algorithm.
4
Supervised Learning vs Reinforcement Learning
Supervised Learning
Step: 1
Teacher: Does picture 1 show a car or a flower?
Learner: A flower.
Teacher: No, it’s a car.
Step: 2
Teacher: Does picture 2 show a car or a flower?
Learner: A car.
Teacher: Yes, it’s a car.
Step: 3 ....
5
Reinforcement Learning
Step: 1
World: You are in state 9. Choose action A or C.
Learner: Action A.
World: Your reward is 100.
Step: 2
World: You are in state 32. Choose action B or E.
Learner: Action B.
World: Your reward is 50.
Step: 3 ....
Supervised Learning vs Reinforcement Learning
6
7
8
Introduction (Cont..)
 Meaning of Reinforcement: Occurrence of an
event, in the proper relation to a response, that tends
to increase the probability that the response will
occur again in the same situation.
 Reinforcement learning is the problem faced by an
agent that learns behavior through trial-and-error
interactions with a dynamic environment.
 Reinforcement Learning is learning how to act in
order to maximize a numerical reward.
9
Introduction …
 Reinforcement learning is not a type of neural
network, nor is it an alternative to neural networks.
Rather, it is an area of Learning Machine.
 Reinforcement learning return delayed feedback that
evaluates the learner's performance but is not told of
which action is the correct one to achieve its goal
Reward Hypothesis
 All goals can be described by the maximization of expected
cumulative reward.
 Make a robot to walk: +R for forward, -R for falling over.
 Play ATARI games: +R / -R for increasing/decreasing score.
 Control a helicopter: + R / -R following trajectory / crashing.
10
Q – Learning
 There are many different ways a reinforcement learning agent
can be trained, but a common one is call
Q-learning.
 Before we talk about Q-learning, we need to cover some
background material.
 Markov Decision Processes.
 Value functions
11
 Model-free (vs Model-based):
MDP model is unknown, but experience can be sampled MDP
Model is known, but is too big to use, except by samples.
 Off-policy (vs On-policy):
Can learn about policy from experience sampled from some
other policy.
Q-Learning …
12
Markov Decision Process
 A set of possible world states 𝑆
 A set of possible actions 𝐴
 A real valued reward function 𝑅(𝑠, 𝑎)
 A transition function 𝑇(𝑠, 𝑎, 𝑠’) = 𝑃(𝑠’|𝑠, 𝑎) - the
probability of transition from 𝑠 to 𝑠’ given action 𝑎
 A policy 𝜋 is a mapping from 𝑆 to 𝐴
Policy
13
 𝑄 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑄 𝜋
𝑠, 𝑎 = 𝔼 𝑅𝑡|𝑠𝑡 = 𝑠, 𝑎 𝑡 = 𝑎, 𝜋
 Is a prediction of future reward.
 Next reward plus the best I can do from the next state
𝑄 𝑠, 𝑎 = 𝑅 𝑠, 𝑎, 𝑠′
+ 𝛾𝑚𝑎x 𝑎′Q s′
, a′
𝛾 𝜖 [0,1] a discount factor to give later rewards less effect
Value functions
14
15
 We’re looking for the optimal policy that no policy generates
more reward than it.
𝑄∗
𝑠, 𝑎 = max
𝜋
𝑄 𝜋
𝑠, 𝑎
 Deterministic policy a = argmax
𝑎′∈𝐴
𝑄∗
𝑠, 𝑎′
 Bellman equation 𝑄∗
𝑠, 𝑎 = 𝔼 𝑠′ 𝑟 + 𝛾 max
𝑎′
𝑄∗
𝑠′, 𝑎′ |𝑠, 𝑎
 Recursively with dynamic programming.
Getting the Policy
16
 We want to pick good actions most of the time, but
also do some exploration:
 Exploring means that we can learn better policies
 But, we want to balance known good actions with
exploratory ones
 This is called the exploration/exploitation problem
Exploration - Exploitation dilemma
17
Deep Q-Network …
18
CNN
19
Deep Q-Network …
20
Input of QN
21
Stochastic gradient descent
22
 Deep learning algorithms require
 huge training datasets
 independence between samples
 fixed underlying data distribution
Theoretical complications
23
 To avoids theoretical complications.
 greater data efficiency
each experience potentially used in many weight udpates
 reduce correlations between samples
randomizing samples breaks correlations from consecutive
samples
 experience replay averages behavior distribution over states
smooths out learning
avoids oscillations or divergence in gradient descent
Deep Q-learning …
24
Serial Deep Q-learning
25
Demo Video
26
Like an expert player!!
27
28
29
• Mnih et al. Playing Atari with deep reinforcement learning.
arXiv preprint arXiv:1312.5602, 2013.
• Mnih et al. Human-level control through deep reinforcement
learning. Nature, 518(7540):529–533, 2015.
• Course Udacity Machine Learning:Reinforcement Learning
https://www.youtube.com/playlist?list=PLAwxTw4SYaPnidDwo9e2c7ixIsu_pdSNp
References
Thanks for listening

Más contenido relacionado

La actualidad más candente

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 

La actualidad más candente (20)

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 

Similar a Reinforcement Learning

reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )Machine learning ( Part 3 )
Machine learning ( Part 3 )Sunil OS
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfAiblogtech
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Julia Maddalena
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 

Similar a Reinforcement Learning (20)

reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )Machine learning ( Part 3 )
Machine learning ( Part 3 )
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
 

Último

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Último (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Reinforcement Learning

  • 2. Outline ■ Introduction ■ Element of reinforcement learning ■ Q-learning ■ Deep Q-Network ■ Demo ■ References 2
  • 3. 3 Introduction  Supervised learning : a situation in which sample (input, output) pairs of the function to be learned can be perceived or are given  Unsupervised learning : Data Driven (Clustering)  Reinforcement learning —  Close to human learning.  Algorithm learns a policy of how to act in a given environment.  Every action has some effect in the environment, and the environment provides rewards that guides the learning algorithm.
  • 4. 4 Supervised Learning vs Reinforcement Learning Supervised Learning Step: 1 Teacher: Does picture 1 show a car or a flower? Learner: A flower. Teacher: No, it’s a car. Step: 2 Teacher: Does picture 2 show a car or a flower? Learner: A car. Teacher: Yes, it’s a car. Step: 3 ....
  • 5. 5 Reinforcement Learning Step: 1 World: You are in state 9. Choose action A or C. Learner: Action A. World: Your reward is 100. Step: 2 World: You are in state 32. Choose action B or E. Learner: Action B. World: Your reward is 50. Step: 3 .... Supervised Learning vs Reinforcement Learning
  • 6. 6
  • 7. 7
  • 8. 8 Introduction (Cont..)  Meaning of Reinforcement: Occurrence of an event, in the proper relation to a response, that tends to increase the probability that the response will occur again in the same situation.  Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.  Reinforcement Learning is learning how to act in order to maximize a numerical reward.
  • 9. 9 Introduction …  Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an area of Learning Machine.  Reinforcement learning return delayed feedback that evaluates the learner's performance but is not told of which action is the correct one to achieve its goal
  • 10. Reward Hypothesis  All goals can be described by the maximization of expected cumulative reward.  Make a robot to walk: +R for forward, -R for falling over.  Play ATARI games: +R / -R for increasing/decreasing score.  Control a helicopter: + R / -R following trajectory / crashing. 10
  • 11. Q – Learning  There are many different ways a reinforcement learning agent can be trained, but a common one is call Q-learning.  Before we talk about Q-learning, we need to cover some background material.  Markov Decision Processes.  Value functions 11
  • 12.  Model-free (vs Model-based): MDP model is unknown, but experience can be sampled MDP Model is known, but is too big to use, except by samples.  Off-policy (vs On-policy): Can learn about policy from experience sampled from some other policy. Q-Learning … 12
  • 13. Markov Decision Process  A set of possible world states 𝑆  A set of possible actions 𝐴  A real valued reward function 𝑅(𝑠, 𝑎)  A transition function 𝑇(𝑠, 𝑎, 𝑠’) = 𝑃(𝑠’|𝑠, 𝑎) - the probability of transition from 𝑠 to 𝑠’ given action 𝑎  A policy 𝜋 is a mapping from 𝑆 to 𝐴 Policy 13
  • 14.  𝑄 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑄 𝜋 𝑠, 𝑎 = 𝔼 𝑅𝑡|𝑠𝑡 = 𝑠, 𝑎 𝑡 = 𝑎, 𝜋  Is a prediction of future reward.  Next reward plus the best I can do from the next state 𝑄 𝑠, 𝑎 = 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑚𝑎x 𝑎′Q s′ , a′ 𝛾 𝜖 [0,1] a discount factor to give later rewards less effect Value functions 14
  • 15. 15
  • 16.  We’re looking for the optimal policy that no policy generates more reward than it. 𝑄∗ 𝑠, 𝑎 = max 𝜋 𝑄 𝜋 𝑠, 𝑎  Deterministic policy a = argmax 𝑎′∈𝐴 𝑄∗ 𝑠, 𝑎′  Bellman equation 𝑄∗ 𝑠, 𝑎 = 𝔼 𝑠′ 𝑟 + 𝛾 max 𝑎′ 𝑄∗ 𝑠′, 𝑎′ |𝑠, 𝑎  Recursively with dynamic programming. Getting the Policy 16
  • 17.  We want to pick good actions most of the time, but also do some exploration:  Exploring means that we can learn better policies  But, we want to balance known good actions with exploratory ones  This is called the exploration/exploitation problem Exploration - Exploitation dilemma 17
  • 19. 19
  • 23.  Deep learning algorithms require  huge training datasets  independence between samples  fixed underlying data distribution Theoretical complications 23
  • 24.  To avoids theoretical complications.  greater data efficiency each experience potentially used in many weight udpates  reduce correlations between samples randomizing samples breaks correlations from consecutive samples  experience replay averages behavior distribution over states smooths out learning avoids oscillations or divergence in gradient descent Deep Q-learning … 24
  • 27. Like an expert player!! 27
  • 28. 28
  • 29. 29
  • 30.
  • 31.
  • 32. • Mnih et al. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. • Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015. • Course Udacity Machine Learning:Reinforcement Learning https://www.youtube.com/playlist?list=PLAwxTw4SYaPnidDwo9e2c7ixIsu_pdSNp References

Notas del editor

  1. Q-Learning Algorithm 1. Initialize Q(s, a) to small random values, ∀s, a 2. Observe state, s 3. Pick an action, a, and do it 4. Observe next state, s’, and reward, r 5. Q(s, a) ← (1 - α)Q(s, a) + α(r + γmaxa’Q(s’, a’)) 6. Go to 2 0 ≤ α ≤ 1 is the learning rate And user ε-greedy in pivking actiones • Pick best (greedy) action with probability ε • Otherwise, pick a random action
  2. - There is always optimal policy for any MPD - All optimal policies achieve the optimal value function - All optimal policies achieve the optimal action-value function All you need is to find q*