Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

Robot Swarms as Ensembles of
Cooperating Components
Matthias Hölzl
With contributions from Martin Wirsing, Annabelle Klarl
AWASS
Lucca, June 24, 2013
www.ascens-ist.eu

The Task
Robots cleaning an exhibition area
Matthias Hölzl 2

marXbot
Miniature mobile robot developed by EPFL
Rough-terrain mobility
Robots can dock to other robots
Many sensors
Proximity sensors
Gyroscope
3D accelerometer
RFID reader
Cameras
. . .
ARM-based Linux system
Gripper for picking up items
Matthias Hölzl 3

Swarm Robotics
Matthias Hölzl 4

Problems
Noise, sensor resolution
Extracting information from sensor data
Unforeseen situations
Uncertainty about the environment
Performing complex actions when intermediate
results are uncertain
. . .
Matthias Hölzl 5

Action Logics
Logics that can represent change over time
Probabilistic behavior can be modeled (but is cumbersome)
Matthias Hölzl 6

Markov Decision Processes
pos = (x,y)
pos = (x,y+1) pos = (x+1,y+1)
pos = (x + 1,y)
e / ...
s, n / ...
s / 0.9 / -0.1
e,w / 0.025 / -0.1
w / ...
s,n / ...
e / ...
s, n / ...
w/ ...
s,n / ...
n / 0.9 / -0.1
e, w / 0.025 / -0.1
s,n,w,e / 0.05 / -0.1
Matthias Hölzl 7

Markov Decision Processes
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
ﬂirt
p = 0.05
p = 0.95
Matthias Hölzl 8

MDPs: Strategies
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner
dance p = 0.5
dance p = 0.5
drinkBeer
Oh, oh
Oh, no
flirt
p = 0.05
p = 0.95
State TV CDB CDF
DA watchTV goToClub goToClub
IC drinkBeer drinkBeer dance
DWP flirt flirt flirt
Utility 0.1 −0.05(∗) −1.975(∗)
(∗) 0.05 + (−0.1)
(∗∗) 0.05 + (0.5 × 0.2) + 0.5 × (0.25 + (0.05 × 5) + (0.95 × −5))
Matthias Hölzl 9

Reinforcement Learning
General idea:
Figure out the expected
value of each action
in each state
Pick the action with
the highest expected
value (most of the
time)
Update the expectations
according to the actual
rewards
Matthias Hölzl 10

How well does this work?
Rather well for small problems
But: state explosion
Matthias Hölzl 11

Solutions
Decomposition
Hierarchy
Partial programs
Matthias Hölzl 12

POEM
Action language
First-order reasoning
Hierarchical reinforcement learning
Learns completions for partial programs
Concurrency
Reﬂection / meta-object protocols
· · ·
Matthias Hölzl 13

Iliad: A POEM Implementation
Common Lisp-based programming language
Full ﬁrst-order reasoning
Operations on logical theories: U(C)NA, domain closure, . . .
Resolution, hyperresolution, DPLL, etc.
Conditional answer extraction
Procedural attachment, constraint solving
Hierarchical reinforcement learning
Based on Concurrent ALisp
Partial programs
Threadwise and temporal state abstraction
Hierarchically Optimal Yet Local (HOLY) Q-learning
Hierarchically Optimal Recursively Decomposed (HORD) Q-learning
Matthias Hölzl 14

Planned Contents
Introduction to CALisp/Poem
Simple TD-learning: bandits
Flat reinforcement learning: navigation
Hierarchical reinforcement learning:
collecting items individually
Threadwise decomposition for hierarchical
reinforcement learning: learning collaboratively
Matthias Hölzl 15

n-armed Bandits
S
search / 1.0 / N(0.1, 1.0)
coll-known / 1.0 / N(0.3, 3.0)
Choice between n actions
Reward depends probabilisticly on
the action choice
No long-term consequences
Simplest form of TD-learning
Matthias Hölzl 16

Flat Learning
XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)
XXTT XX XX
XX XXXXXXXXXX XXXX Choices: (N E S W)
XXXXXX XX XX XX Q-values:
XX XX XXXXXXXX XX #((N (Q -1.8))
XX XXXXXX XX XX (E (Q -1.8))
XX XX XXXXXXXX (S (Q -2.25))
XXXXXXXXXX XX XX (W (Q 2.76)))
XX XX XXXX XX Recommended choice is W
XX XX XX XX XX XX
XX XX RR XX
XXXXXXXXXXXXXXXXXXXXXXXX
Matthias Hölzl 17

Flat Learning
(defun simple-robot ()
(call (nav (target-loc (robot-env)))))
(defun nav (loc)
(until (equal (robot-loc) loc)
(with-choice navigate-choice (dir ’(N E S W))
(action navigate-move dir))))
Matthias Hölzl 18

Hierarchical Learning
(defun waste-removal ()
(loop
(choose choose-waste-removal-action
(action sleep ’SLEEP)
(call (pickup-waste))
(call (drop-waste)))))
(defun pickup-waste ()
(call (nav (waste-source)))
(action pickup-waste ’PICKUP))
Matthias Hölzl 19

Multi-Robot Learning
XXTT XX RR XX
XX XX XX
XXXXXX XX XX
XX XX RW RR XX
XX XX
XX RR WW XX
XX XX
XX RR XX
XX XX
XX XX
Matthias Hölzl 20

Thank you!
Any Questions?
matthias.hoelzl@ifi.lmu.de
Matthias Hölzl 21

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

Recomendados

Recomendados

Más contenido relacionado

Más de FET AWARE project - Self Awareness in Autonomic Systems

Más de FET AWARE project - Self Awareness in Autonomic Systems (20)

Último

Último (20)

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl