SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Robot Swarms as Ensembles of
Cooperating Components
Matthias Hölzl
With contributions from Martin Wirsing, Annabelle Klarl
AWASS
Lucca, June 24, 2013
www.ascens-ist.eu
The Task
Robots cleaning an exhibition area
Matthias Hölzl 2
marXbot
Miniature mobile robot developed by EPFL
Rough-terrain mobility
Robots can dock to other robots
Many sensors
Proximity sensors
Gyroscope
3D accelerometer
RFID reader
Cameras
. . .
ARM-based Linux system
Gripper for picking up items
Matthias Hölzl 3
Swarm Robotics
Matthias Hölzl 4
Problems
Noise, sensor resolution
Extracting information from sensor data
Unforeseen situations
Uncertainty about the environment
Performing complex actions when intermediate
results are uncertain
. . .
Matthias Hölzl 5
Action Logics
Logics that can represent change over time
Probabilistic behavior can be modeled (but is cumbersome)
Matthias Hölzl 6
Markov Decision Processes
pos = (x,y)
pos = (x,y+1) pos = (x+1,y+1)
pos = (x + 1,y)
e / ...
s, n / ...
s / 0.9 / -0.1
e,w / 0.025 / -0.1
w / ...
s,n / ...
e / ...
s, n / ...
w/ ...
s,n / ...
n / 0.9 / -0.1
e, w / 0.025 / -0.1
s,n,w,e / 0.05 / -0.1
Matthias Hölzl 7
Markov Decision Processes
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
Matthias Hölzl 8
MDPs: Strategies
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
State TV CDB CDF
DA watchTV goToClub goToClub
IC drinkBeer drinkBeer dance
DWP flirt flirt flirt
Utility 0.1 −0.05(∗) −1.975(∗)
(∗) 0.05 + (−0.1)
(∗∗) 0.05 + (0.5 × 0.2) + 0.5 × (0.25 + (0.05 × 5) + (0.95 × −5))
Matthias Hölzl 9
Reinforcement Learning
General idea:
Figure out the expected
value of each action
in each state
Pick the action with
the highest expected
value (most of the
time)
Update the expectations
according to the actual
rewards
Matthias Hölzl 10
How well does this work?
Rather well for small problems
But: state explosion
Matthias Hölzl 11
Solutions
Decomposition
Hierarchy
Partial programs
Matthias Hölzl 12
POEM
Action language
First-order reasoning
Hierarchical reinforcement learning
Learns completions for partial programs
Concurrency
Reflection / meta-object protocols
· · ·
Matthias Hölzl 13
Iliad: A POEM Implementation
Common Lisp-based programming language
Full first-order reasoning
Operations on logical theories: U(C)NA, domain closure, . . .
Resolution, hyperresolution, DPLL, etc.
Conditional answer extraction
Procedural attachment, constraint solving
Hierarchical reinforcement learning
Based on Concurrent ALisp
Partial programs
Threadwise and temporal state abstraction
Hierarchically Optimal Yet Local (HOLY) Q-learning
Hierarchically Optimal Recursively Decomposed (HORD) Q-learning
Matthias Hölzl 14
Planned Contents
Introduction to CALisp/Poem
Simple TD-learning: bandits
Flat reinforcement learning: navigation
Hierarchical reinforcement learning:
collecting items individually
Threadwise decomposition for hierarchical
reinforcement learning: learning collaboratively
Matthias Hölzl 15
n-armed Bandits
S
search / 1.0 / N(0.1, 1.0)
coll-known / 1.0 / N(0.3, 3.0)
Choice between n actions
Reward depends probabilisticly on
the action choice
No long-term consequences
Simplest form of TD-learning
Matthias Hölzl 16
Flat Learning
XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)
XXTT XX XX
XX XXXXXXXXXX XXXX Choices: (N E S W)
XXXXXX XX XX XX Q-values:
XX XX XXXXXXXX XX #((N (Q -1.8))
XX XXXXXX XX XX (E (Q -1.8))
XX XX XXXXXXXX (S (Q -2.25))
XXXXXXXXXX XX XX (W (Q 2.76)))
XX XX XXXX XX Recommended choice is W
XX XX XX XX XX XX
XX XX RR XX
XXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 17
Flat Learning
(defun simple-robot ()
(call (nav (target-loc (robot-env)))))
(defun nav (loc)
(until (equal (robot-loc) loc)
(with-choice navigate-choice (dir ’(N E S W))
(action navigate-move dir))))
Matthias Hölzl 18
Hierarchical Learning
(defun waste-removal ()
(loop
(choose choose-waste-removal-action
(action sleep ’SLEEP)
(call (pickup-waste))
(call (drop-waste)))))
(defun pickup-waste ()
(call (nav (waste-source)))
(action pickup-waste ’PICKUP))
Matthias Hölzl 19
Multi-Robot Learning
XXXXXXXXXXXXXXXXXXXXXXXX
XXTT XX RR XX
XX XX XX
XXXXXX XX XX
XX XX RW RR XX
XX XX
XX RR WW XX
XX XX
XX RR XX
XX XX
XX XX
XXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 20
Thank you!
Any Questions?
matthias.hoelzl@ifi.lmu.de
Matthias Hölzl 21

Más contenido relacionado

Más de FET AWARE project - Self Awareness in Autonomic Systems

Más de FET AWARE project - Self Awareness in Autonomic Systems (20)

Academic Course: 02 Self-organization and emergence in networked systems
Academic Course: 02 Self-organization and emergence in networked systemsAcademic Course: 02 Self-organization and emergence in networked systems
Academic Course: 02 Self-organization and emergence in networked systems
 
Academic Course: 01 Self-awarenesss and Computational Self-awareness
Academic Course: 01 Self-awarenesss and Computational Self-awarenessAcademic Course: 01 Self-awarenesss and Computational Self-awareness
Academic Course: 01 Self-awarenesss and Computational Self-awareness
 
Awareness: Layman Seminar Slides
Awareness: Layman Seminar SlidesAwareness: Layman Seminar Slides
Awareness: Layman Seminar Slides
 
Industry Training: 04 Awareness Applications
Industry Training: 04 Awareness ApplicationsIndustry Training: 04 Awareness Applications
Industry Training: 04 Awareness Applications
 
Industry Training: 03 Awareness Simulation
Industry Training: 03 Awareness SimulationIndustry Training: 03 Awareness Simulation
Industry Training: 03 Awareness Simulation
 
Industry Training: 02 Awareness Properties
Industry Training: 02 Awareness PropertiesIndustry Training: 02 Awareness Properties
Industry Training: 02 Awareness Properties
 
Industry Training: 01 Awareness Overview
Industry Training: 01 Awareness OverviewIndustry Training: 01 Awareness Overview
Industry Training: 01 Awareness Overview
 
Towards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin WirsingTowards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin Wirsing
 
Capturing the Immune System: From the wet-­lab to the robot, building better ...
Capturing the Immune System: From the wet-­lab to the robot, building better ...Capturing the Immune System: From the wet-­lab to the robot, building better ...
Capturing the Immune System: From the wet-­lab to the robot, building better ...
 
Underwater search and rescue in swarm robotics - Mark Read
Underwater search and rescue in swarm robotics - Mark Read Underwater search and rescue in swarm robotics - Mark Read
Underwater search and rescue in swarm robotics - Mark Read
 
Computational Self-awareness in Smart-Camera Networks - Lukas Esterle
Computational Self-awareness in Smart-Camera Networks - Lukas EsterleComputational Self-awareness in Smart-Camera Networks - Lukas Esterle
Computational Self-awareness in Smart-Camera Networks - Lukas Esterle
 
Why Robots may need to be self-­‐aware, before we can really trust them - Ala...
Why Robots may need to be self-­‐aware, before we can really trust them - Ala...Why Robots may need to be self-­‐aware, before we can really trust them - Ala...
Why Robots may need to be self-­‐aware, before we can really trust them - Ala...
 
Morphogenetic Engineering: Reconciling Architecture and Self-Organization Thr...
Morphogenetic Engineering: Reconciling Architecture and Self-Organization Thr...Morphogenetic Engineering: Reconciling Architecture and Self-Organization Thr...
Morphogenetic Engineering: Reconciling Architecture and Self-Organization Thr...
 
Ensemble-oriented programming of self-adaptive systems - Michele Loreti
Ensemble-oriented programming of self-adaptive systems - Michele LoretiEnsemble-oriented programming of self-adaptive systems - Michele Loreti
Ensemble-oriented programming of self-adaptive systems - Michele Loreti
 
Self-awareness and Adaptive Technologies: the Future of Operating Systems?
Self-awareness and Adaptive Technologies: the Future of Operating Systems? Self-awareness and Adaptive Technologies: the Future of Operating Systems?
Self-awareness and Adaptive Technologies: the Future of Operating Systems?
 
EnhancingWeb Process Self-Awareness with Context-Aware Service Composition
EnhancingWeb Process Self-Awareness with Context-Aware Service CompositionEnhancingWeb Process Self-Awareness with Context-Aware Service Composition
EnhancingWeb Process Self-Awareness with Context-Aware Service Composition
 
Testing cooperative autonomous systems for unwanted emergent behaviour and da...
Testing cooperative autonomous systems for unwanted emergent behaviour and da...Testing cooperative autonomous systems for unwanted emergent behaviour and da...
Testing cooperative autonomous systems for unwanted emergent behaviour and da...
 
Enduring Institutions and Self-Organising Trust-Adaptive Systems for an Open ...
Enduring Institutions and Self-Organising Trust-Adaptive Systems for an Open ...Enduring Institutions and Self-Organising Trust-Adaptive Systems for an Open ...
Enduring Institutions and Self-Organising Trust-Adaptive Systems for an Open ...
 
SmartContent: A self protecting and context aware active content
SmartContent: A self protecting and context aware active contentSmartContent: A self protecting and context aware active content
SmartContent: A self protecting and context aware active content
 
Autonomic Systems Research
Autonomic Systems ResearchAutonomic Systems Research
Autonomic Systems Research
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

  • 1. Robot Swarms as Ensembles of Cooperating Components Matthias Hölzl With contributions from Martin Wirsing, Annabelle Klarl AWASS Lucca, June 24, 2013 www.ascens-ist.eu
  • 2. The Task Robots cleaning an exhibition area Matthias Hölzl 2
  • 3. marXbot Miniature mobile robot developed by EPFL Rough-terrain mobility Robots can dock to other robots Many sensors Proximity sensors Gyroscope 3D accelerometer RFID reader Cameras . . . ARM-based Linux system Gripper for picking up items Matthias Hölzl 3
  • 5. Problems Noise, sensor resolution Extracting information from sensor data Unforeseen situations Uncertainty about the environment Performing complex actions when intermediate results are uncertain . . . Matthias Hölzl 5
  • 6. Action Logics Logics that can represent change over time Probabilistic behavior can be modeled (but is cumbersome) Matthias Hölzl 6
  • 7. Markov Decision Processes pos = (x,y) pos = (x,y+1) pos = (x+1,y+1) pos = (x + 1,y) e / ... s, n / ... s / 0.9 / -0.1 e,w / 0.025 / -0.1 w / ... s,n / ... e / ... s, n / ... w/ ... s,n / ... n / 0.9 / -0.1 e, w / 0.025 / -0.1 s,n,w,e / 0.05 / -0.1 Matthias Hölzl 7
  • 8. Markov Decision Processes watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 Matthias Hölzl 8
  • 9. MDPs: Strategies watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 State TV CDB CDF DA watchTV goToClub goToClub IC drinkBeer drinkBeer dance DWP flirt flirt flirt Utility 0.1 −0.05(∗) −1.975(∗) (∗) 0.05 + (−0.1) (∗∗) 0.05 + (0.5 × 0.2) + 0.5 × (0.25 + (0.05 × 5) + (0.95 × −5)) Matthias Hölzl 9
  • 10. Reinforcement Learning General idea: Figure out the expected value of each action in each state Pick the action with the highest expected value (most of the time) Update the expectations according to the actual rewards Matthias Hölzl 10
  • 11. How well does this work? Rather well for small problems But: state explosion Matthias Hölzl 11
  • 13. POEM Action language First-order reasoning Hierarchical reinforcement learning Learns completions for partial programs Concurrency Reflection / meta-object protocols · · · Matthias Hölzl 13
  • 14. Iliad: A POEM Implementation Common Lisp-based programming language Full first-order reasoning Operations on logical theories: U(C)NA, domain closure, . . . Resolution, hyperresolution, DPLL, etc. Conditional answer extraction Procedural attachment, constraint solving Hierarchical reinforcement learning Based on Concurrent ALisp Partial programs Threadwise and temporal state abstraction Hierarchically Optimal Yet Local (HOLY) Q-learning Hierarchically Optimal Recursively Decomposed (HORD) Q-learning Matthias Hölzl 14
  • 15. Planned Contents Introduction to CALisp/Poem Simple TD-learning: bandits Flat reinforcement learning: navigation Hierarchical reinforcement learning: collecting items individually Threadwise decomposition for hierarchical reinforcement learning: learning collaboratively Matthias Hölzl 15
  • 16. n-armed Bandits S search / 1.0 / N(0.1, 1.0) coll-known / 1.0 / N(0.3, 3.0) Choice between n actions Reward depends probabilisticly on the action choice No long-term consequences Simplest form of TD-learning Matthias Hölzl 16
  • 17. Flat Learning XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0) XXTT XX XX XX XXXXXXXXXX XXXX Choices: (N E S W) XXXXXX XX XX XX Q-values: XX XX XXXXXXXX XX #((N (Q -1.8)) XX XXXXXX XX XX (E (Q -1.8)) XX XX XXXXXXXX (S (Q -2.25)) XXXXXXXXXX XX XX (W (Q 2.76))) XX XX XXXX XX Recommended choice is W XX XX XX XX XX XX XX XX RR XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 17
  • 18. Flat Learning (defun simple-robot () (call (nav (target-loc (robot-env))))) (defun nav (loc) (until (equal (robot-loc) loc) (with-choice navigate-choice (dir ’(N E S W)) (action navigate-move dir)))) Matthias Hölzl 18
  • 19. Hierarchical Learning (defun waste-removal () (loop (choose choose-waste-removal-action (action sleep ’SLEEP) (call (pickup-waste)) (call (drop-waste))))) (defun pickup-waste () (call (nav (waste-source))) (action pickup-waste ’PICKUP)) Matthias Hölzl 19
  • 20. Multi-Robot Learning XXXXXXXXXXXXXXXXXXXXXXXX XXTT XX RR XX XX XX XX XXXXXX XX XX XX XX RW RR XX XX XX XX RR WW XX XX XX XX RR XX XX XX XX XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 20