Deep Q-Learning

•

10 recomendaciones•3,924 vistas

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

Tecnología

Deep Q-Learning
A Reinforcement Learning approach

What is Reinforcement Learning?
- Much like biological agents behave
- No supervisor, only a reward
- Data is time dependent (non iid)
- Feedback is delayed
- Agent actions affect the data it receives

Examples
- Play checkers (1959)
- Defeat the world champion at Backgammon (1992)
- Control a helicopter (2008)
- Make a robot to walk
- Robocup Soccer
- Play ATARI games better than humans (2014)
- Defeat the world champion at Go (2016)
Videos

Reward Hypothesis
All goals can be described by the maximisation of expected cumulative reward
- Defeat the world champion at Go: +R / -R for winning/losing a game
- Make a robot to walk: +R for forward, -R for falling over
- Play ATARI games: +R / -R for increasing/decreasing score
- Control a helicopter: + R / -R following trajectory / crashing

Fully Observable Environments
Fully Observable Environments (agent state = environment state):
- Agent directly observes environment
- Example: chess board
Partially Observable Environments (agent state not equal environment state):
- Agent indirectly observes environment
- Example: A robot with motion sensor or camera
- Agent must construct its own state representation

RL components: Policy and Value Function
Policy is agent’s behaviour function
- Maps from state to action
- Deterministic policy:
- Stochastic:
Value function is a is a prediction of future reward
- Used to evaluate state and select between actions
-

Model
Predicts what environment will do next:

Maze example: r = -1 per time-step and policy
[David Silver. Advanced Topics: RL]

Maze example: Value function and Model
[David Silver. Advanced Topics: RL]

Math: Markov Decision Process (MDP)
Almost all RL problems can be formalised as MDPs
It’s a tuple:
- S is finite set of states
- A is finite set of actions
- P is state transition probability matrix:
- R is a reward function:
- Discount factor:

State-Value and Action-Value functions, Bellman eq.
Expected return starting from state s, and then following policy :
Expected return starting from state s, taking action a, and then following policy :

Finding an Optimal Policy
- There is always optimal policy for any MPD
- All optimal policies achieve the optimal value function
- All optimal policies achieve the optimal action-value function
All you need is to find

Bellman Opt Equation for state-value function
[David Silver. Advanced Topics: RL]

Bellman Opt Equation for action-value function
[David Silver. Advanced Topics: RL]

Q-Learning - model-free off-policy control algorithm
Model-free (vs Model-based):
- MDP model is unknown, but experience can be sampled MDP
- Model is known, but is too big to use, except by samples
Off-policy (vs On-policy):
- Can learn about policy from experience sampled from some other policy
Control (vs Prediction):
- Find best policy

Q-Learning
[David Silver. Advanced Topics: RL]

DQN - Q-Learning with function approximation
[Human-level control through deep reinforcement learning]

[Human-level control through deep reinforcement learning]

Issues with Q-learning with neural network
- Data is sequential (non-iid)
- Policy changes rapidly with slight changes to Q-values
- Policy may oscillate
- Experience flows from one extreme to another
- Scale of rewards and Q-values is unknown
- Unstable backpropagation due to large gradients

DQN solutions
- Use experience replay
- Breaks correlations in data
- Learn from all past policies
- Using off-policy Q-learning
- Freeze target Q-network
- Avoid policy oscillations
- Break correlations between Q-network and target
- Clip rewards and gradients

Links
- Human-level control through deep reinforcement learning
- Course: David Silver. Advanced Topics: RL
- Tutorial: David Silver. Deep Reinforcement Learning
- Book: Sutton, Barto. Reinforcement learning
- Source Code: simple_dqn
- Reinforcejs
- The Arcade Learning Environment

Más contenido relacionado

La actualidad más candente

Deep Reinforcement Learning and Its ApplicationsBill Liu

Reinforcement Learning Q-Learning Melaku Eneayehu

An introduction to reinforcement learningJie-Han Chen

Reinforcement learningDongHyun Kwak

Reinforcement learningShahan Ali Memon

Deep Reinforcement Learning: Q-LearningKai-Wen Zhao

Markov decision processHamed Abdi

Reinforcement LearningMuhammad Iqbal Tawakal

Reinforcement LearningSalem-Kabbani

Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee

Actor critic algorithmJie-Han Chen

Policy gradientJie-Han Chen

Reinforcement learning 7313Slideshare

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.

Intro to Reinforcement learning - part IIIMikko Mäkipää

DQN (Deep Q-Network)Dong Guo

Generalized Reinforcement LearningPo-Hsiang (Barnett) Chiu

Reinforcement LearningDongHyun Kwak

Artificial Intelligence: What Is Reinforcement Learning?Bernard Marr

Reinforcement Learningbutest

La actualidad más candente (20)

Deep Reinforcement Learning and Its Applications

Reinforcement Learning Q-Learning

An introduction to reinforcement learning

Reinforcement learning

Deep Reinforcement Learning: Q-Learning

Markov decision process

Reinforcement Learning

Reinforcement Learning 6. Temporal Difference Learning

Actor critic algorithm

Policy gradient

Reinforcement learning 7313

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman

Intro to Reinforcement learning - part III

DQN (Deep Q-Network)

Generalized Reinforcement Learning

Reinforcement Learning

Artificial Intelligence: What Is Reinforcement Learning?

Reinforcement Learning

Destacado

Distributed Deep Q-LearningLyft

Deep Q-Network　論文輪読会Kotaro Tanahashi

Reinforcement learning Chandra Meena

Human brain how it workhudvin

1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn

Encoding Robotic Sensor States for Q-Learning using the butest

Face detection and recognition using OpenCVAndrew Babiy

Deep Q-Network for beginnersEtsuji Nakai

Your first TensorFlow programming with JupyterEtsuji Nakai

"Playing Atari with Deep Reinforcement Learning"mooopan

強化学習入門Shunta Saito

最近のDQNmooopan

MachineLearning_QLearningCircuitSean Williams

нейронные сетиhudvin

Основы коспьютерного стерео зренияArtyom Shklovets

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark

SURFAndrew Babiy

Recognition of handwritten digitsAndrew Babiy

Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU

NLP Project Full CycleVsevolod Dyomkin

Destacado (20)

Distributed Deep Q-Learning

Deep Q-Network　論文輪読会

Reinforcement learning

Human brain how it work

1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration

Encoding Robotic Sensor States for Q-Learning using the

Face detection and recognition using OpenCV

Deep Q-Network for beginners

Your first TensorFlow programming with Jupyter

"Playing Atari with Deep Reinforcement Learning"

強化学習入門

Similar a Deep Q-Learning

Reinfrocement LearningNatan Katz

reinforcement-learning its based on the slide of universityMOHDNADEEM971008

reinforcement-learning.ppthemalathache

Making Complex Decisions(Artificial Intelligence)United International University

Intro to Reinforcement learning - part IIMikko Mäkipää

Head First Reinforcement Learningazzeddine chenine

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

RL introKhangBom

Is Production RL at a tipping point?M Waleed Kadous

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club

14_ReinforcementLearning.pptxRithikRaj25

Making smart decisions in real-time with Reinforcement LearningRuth Yakubu

Structured prediction with reinforcement learningguruprasad110

Reinforcement learning, Q-LearningKuppusamy P

Reinforcement Learning with Amazon SageMaker RLThom Lane

Reinforcement Learning on Mine SweeperDataScienceLab

Demystifying deep reinforement learning재연 윤

How to formulate reinforcement learning in illustrative waysYasutoTamura1

Deep Reinforcement learningCairo University

Introduction of Deep Reinforcement LearningNAVER Engineering

Similar a Deep Q-Learning (20)

Reinfrocement Learning

reinforcement-learning its based on the slide of university

reinforcement-learning.ppt

Making Complex Decisions(Artificial Intelligence)

Intro to Reinforcement learning - part II

Head First Reinforcement Learning

Introduction to Deep Reinforcement Learning

RL intro

Is Production RL at a tipping point?

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...

14_ReinforcementLearning.pptx

Making smart decisions in real-time with Reinforcement Learning

Structured prediction with reinforcement learning

Reinforcement learning, Q-Learning

Reinforcement Learning with Amazon SageMaker RL

Reinforcement Learning on Mine Sweeper

Demystifying deep reinforement learning

How to formulate reinforcement learning in illustrative ways

Deep Reinforcement learning

Introduction of Deep Reinforcement Learning

Último

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Key Features Of Token Development (1).pptxLBM Solutions

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

AI as an Interface for Commercial BuildingsMemoori

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

GenCyber Cyber Security Day PresentationMichael W. Hawkins

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

How to convert PDF to text with Nanonetsnaman860154

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Deep Q-Learning

1. Deep Q-Learning A Reinforcement Learning approach

2. What is Reinforcement Learning? - Much like biological agents behave - No supervisor, only a reward - Data is time dependent (non iid) - Feedback is delayed - Agent actions affect the data it receives

3. Examples - Play checkers (1959) - Defeat the world champion at Backgammon (1992) - Control a helicopter (2008) - Make a robot to walk - Robocup Soccer - Play ATARI games better than humans (2014) - Defeat the world champion at Go (2016) Videos

4. Reward Hypothesis All goals can be described by the maximisation of expected cumulative reward - Defeat the world champion at Go: +R / -R for winning/losing a game - Make a robot to walk: +R for forward, -R for falling over - Play ATARI games: +R / -R for increasing/decreasing score - Control a helicopter: + R / -R following trajectory / crashing

5. Agent and Environment

6. Fully Observable Environments Fully Observable Environments (agent state = environment state): - Agent directly observes environment - Example: chess board Partially Observable Environments (agent state not equal environment state): - Agent indirectly observes environment - Example: A robot with motion sensor or camera - Agent must construct its own state representation

7. RL components: Policy and Value Function Policy is agent’s behaviour function - Maps from state to action - Deterministic policy: - Stochastic: Value function is a is a prediction of future reward - Used to evaluate state and select between actions -

8. Model Predicts what environment will do next:

9. Maze example: r = -1 per time-step and policy [David Silver. Advanced Topics: RL]

10. Maze example: Value function and Model [David Silver. Advanced Topics: RL]

11. Exploration - Exploitation dilemma

12. Math: Markov Decision Process (MDP) Almost all RL problems can be formalised as MDPs It’s a tuple: - S is finite set of states - A is finite set of actions - P is state transition probability matrix: - R is a reward function: - Discount factor:

13. State-Value and Action-Value functions, Bellman eq. Expected return starting from state s, and then following policy : Expected return starting from state s, taking action a, and then following policy :

14. Finding an Optimal Policy - There is always optimal policy for any MPD - All optimal policies achieve the optimal value function - All optimal policies achieve the optimal action-value function All you need is to find

15. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]

16. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]

17. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]

18. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]

19. Policy Iteration Demo

20. Q-Learning - model-free off-policy control algorithm Model-free (vs Model-based): - MDP model is unknown, but experience can be sampled MDP - Model is known, but is too big to use, except by samples Off-policy (vs On-policy): - Can learn about policy from experience sampled from some other policy Control (vs Prediction): - Find best policy

21. Q-Learning [David Silver. Advanced Topics: RL]

22. DQN - Q-Learning with function approximation [Human-level control through deep reinforcement learning]

23. [Human-level control through deep reinforcement learning]

24. Issues with Q-learning with neural network - Data is sequential (non-iid) - Policy changes rapidly with slight changes to Q-values - Policy may oscillate - Experience flows from one extreme to another - Scale of rewards and Q-values is unknown - Unstable backpropagation due to large gradients

25. DQN solutions - Use experience replay - Breaks correlations in data - Learn from all past policies - Using off-policy Q-learning - Freeze target Q-network - Avoid policy oscillations - Break correlations between Q-network and target - Clip rewards and gradients

26. Neon Demo

27. Links - Human-level control through deep reinforcement learning - Course: David Silver. Advanced Topics: RL - Tutorial: David Silver. Deep Reinforcement Learning - Book: Sutton, Barto. Reinforcement learning - Source Code: simple_dqn - Reinforcejs - The Arcade Learning Environment

Deep Q-Learning

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Deep Q-Learning

Similar a Deep Q-Learning (20)

Último

Último (20)

Deep Q-Learning