SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Chapter 1: Introduction
Seungjae Ryan Lee
● Learning by interacting with the environment
● Goal: maximize a numerical reward signal by choosing correct actions
○ Trial and error: learner is not told the best action
○ Delayed rewards: actions can affect all future rewards
Reinforcement Learning
● No external supervisor / teacher
○ No training set with labeled examples (answers)
○ Need to interact with environment in uncharted territories
● Different goals
○ Supervised Learning: Generalize existing data to minimize test set error
○ Reinforcement Learning: Maximize reward through interactions
○ Unsupervised Learning: Find hidden structure
→ Reinforcement Learning is a new paradigm of Machine Learning
vs. Supervised and Unsupervised Learning
● Interactions between agent and environment
● Uncertainty about the environment
○ Effects of actions cannot be fully predicted
○ Monitor environment and react appropriately
● Defined goal
○ Judge progress through rewards
● Present affects the future
○ Effect can be delayed
● Experience improves performance
Characteristics of Reinforcement Learning
● Complex sequence of interactions to achieve goal
● Need to observe and react to the uncertainty of the environment
○ Grab different bowl if current bowl is dirty
○ Stop pouring if the bowl is about to overflow
● Actions have delayed consequences
○ Failing to get spoon does not matter until you start eating
● Experience improves performance
Example: Preparing Breakfast
Get
cereal
Get
bowl
Get
spoon
Get
milk
Pour
cereal
Pour
milk
Grab
spoon
● Exploration: Try different actions
● Exploitation: Choose best known action
● Need both to obtain high reward
Exploration vs Exploitation
Try a new cereal? Eat the usual cereal?Agent
● Policy defines the agent’s behavior
● Reward Signal defines the goal of the problem
● Value Function indicates the long-term desirability of state
● Model of the environment mimics behavior of environment
Elements of Reinforcement Learning
● Mapping from observation to action
● Defines the agent’s behavior
● Can be stochastic
Policy
s1
s2
s3
s4
a1
a2
S A
● Reward
○ Immediate reward of action
○ Defines good/bad events for the agent
○ Given by the environment
● Value Function
○ Sum of future rewards from a state
○ Long-term desirability of states
○ Difficult to estimate
○ Primary basis of choosing action
Reward Signal vs. Value Function
● Mimics the behavior of environment
● Allow planning a future course of actions
● Not necessary for all RL methods
○ Model-based methods use the model for planning
○ Model-free methods only use trial-and-error
Model
https://worldmodels.github.io/
● Assume imperfect opponent
● Agent needs to find and exploit imperfections
Example: Tic Tac Toe
Tic Tac Toe with Reinforcement Learning
● Initialize value functions to 0.5 (except terminal states)
● Learn by playing games
○ Move greedily most times, but explore sometimes
● Incrementally update value functions by playing games
● Decrease learning rate over time to converge
● Minimax algorithm
○ Assumes best play for opponent → Cannot exploit opponent
● Classic optimization
○ Require complete specification of opponent → Impractical
○ ex. Dynamic Programming
● Evolutionary methods
○ Finds optimal algorithm
○ Ignores useful structure of RL problems
○ Works best when good policy can be found easily
Tic Tac Toe with other algorithms
● Can be applied to:
○ more complex games (ex. Backgammon)
○ problems without enemies (“games against nature”)
○ problems with partially observable environments
○ non-episodic problems
○ continuous-time problems
Reinforcement Learning beyond Tic-Tac-Toe
https://blog.openai.com/evolution-strategies/
Thank you!
Original content from
● Reinforcement Learning: An Introduction by Sutton and Barto
You can find more content in
● github.com/seungjaeryanlee
● www.endtoend.ai

Más contenido relacionado

La actualidad más candente

Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningshivani saluja
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Universitat Politècnica de Catalunya
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningYoonho Shin
 

La actualidad más candente (20)

Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Deep Q-learning explained
Deep Q-learning explainedDeep Q-learning explained
Deep Q-learning explained
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 

Similar a Reinforcement Learning 1. Introduction

Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithmsmichele minno
 
Aleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAgile Lietuva
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Kristin Valentine
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...mpavi257
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Language Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningLanguage Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningMark Chang
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learningMegha Sharma
 
Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Cassandra Gustafson
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsExpert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsLambda Solutions
 
Intelligent AGent class.pptx
Intelligent AGent class.pptxIntelligent AGent class.pptx
Intelligent AGent class.pptxAdJamesJohn
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIAnand Joshi
 

Similar a Reinforcement Learning 1. Introduction (20)

Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithms
 
Aleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrum
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Language Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningLanguage Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement Learning
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learning
 
Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
CS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptxCS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptx
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsExpert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
 
Intelligent AGent class.pptx
Intelligent AGent class.pptxIntelligent AGent class.pptx
Intelligent AGent class.pptx
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Reinforcement Learning 1. Introduction

  • 2. ● Learning by interacting with the environment ● Goal: maximize a numerical reward signal by choosing correct actions ○ Trial and error: learner is not told the best action ○ Delayed rewards: actions can affect all future rewards Reinforcement Learning
  • 3. ● No external supervisor / teacher ○ No training set with labeled examples (answers) ○ Need to interact with environment in uncharted territories ● Different goals ○ Supervised Learning: Generalize existing data to minimize test set error ○ Reinforcement Learning: Maximize reward through interactions ○ Unsupervised Learning: Find hidden structure → Reinforcement Learning is a new paradigm of Machine Learning vs. Supervised and Unsupervised Learning
  • 4. ● Interactions between agent and environment ● Uncertainty about the environment ○ Effects of actions cannot be fully predicted ○ Monitor environment and react appropriately ● Defined goal ○ Judge progress through rewards ● Present affects the future ○ Effect can be delayed ● Experience improves performance Characteristics of Reinforcement Learning
  • 5. ● Complex sequence of interactions to achieve goal ● Need to observe and react to the uncertainty of the environment ○ Grab different bowl if current bowl is dirty ○ Stop pouring if the bowl is about to overflow ● Actions have delayed consequences ○ Failing to get spoon does not matter until you start eating ● Experience improves performance Example: Preparing Breakfast Get cereal Get bowl Get spoon Get milk Pour cereal Pour milk Grab spoon
  • 6. ● Exploration: Try different actions ● Exploitation: Choose best known action ● Need both to obtain high reward Exploration vs Exploitation Try a new cereal? Eat the usual cereal?Agent
  • 7. ● Policy defines the agent’s behavior ● Reward Signal defines the goal of the problem ● Value Function indicates the long-term desirability of state ● Model of the environment mimics behavior of environment Elements of Reinforcement Learning
  • 8. ● Mapping from observation to action ● Defines the agent’s behavior ● Can be stochastic Policy s1 s2 s3 s4 a1 a2 S A
  • 9. ● Reward ○ Immediate reward of action ○ Defines good/bad events for the agent ○ Given by the environment ● Value Function ○ Sum of future rewards from a state ○ Long-term desirability of states ○ Difficult to estimate ○ Primary basis of choosing action Reward Signal vs. Value Function
  • 10. ● Mimics the behavior of environment ● Allow planning a future course of actions ● Not necessary for all RL methods ○ Model-based methods use the model for planning ○ Model-free methods only use trial-and-error Model https://worldmodels.github.io/
  • 11. ● Assume imperfect opponent ● Agent needs to find and exploit imperfections Example: Tic Tac Toe
  • 12. Tic Tac Toe with Reinforcement Learning ● Initialize value functions to 0.5 (except terminal states) ● Learn by playing games ○ Move greedily most times, but explore sometimes ● Incrementally update value functions by playing games ● Decrease learning rate over time to converge
  • 13. ● Minimax algorithm ○ Assumes best play for opponent → Cannot exploit opponent ● Classic optimization ○ Require complete specification of opponent → Impractical ○ ex. Dynamic Programming ● Evolutionary methods ○ Finds optimal algorithm ○ Ignores useful structure of RL problems ○ Works best when good policy can be found easily Tic Tac Toe with other algorithms
  • 14. ● Can be applied to: ○ more complex games (ex. Backgammon) ○ problems without enemies (“games against nature”) ○ problems with partially observable environments ○ non-episodic problems ○ continuous-time problems Reinforcement Learning beyond Tic-Tac-Toe https://blog.openai.com/evolution-strategies/
  • 15. Thank you! Original content from ● Reinforcement Learning: An Introduction by Sutton and Barto You can find more content in ● github.com/seungjaeryanlee ● www.endtoend.ai