SlideShare una empresa de Scribd logo
1 de 20
Multi-agent RL
in
Sequential Social Dilemmas
Paper Review
MARL in SSD
• Multi Agent Reinforcement Learning
• Sequential Social Dilemmas
=> Understanding Agent Cooperation
=> In sequential situation ( mixed incentive sturcutre of matrix game social dilemma )
learn policies.
Sequential situation
Fruit Gathering
Wolfpack Hunting
Social Dilemma
• A social dilemma is a situation in which an
individual profits from selfishness unless everyone
chooses the selfish alternative, in which case the
whole group loses => Represent with Matrix game
Matrix Game – prisoner’s dilemma
Nash Equilibrium
This is Best Choice..
in global perspective
Betrayal Cooperate Matrix Game Social Dilemma
== MGSD
Rational agent
choice this
( Think reward is - )
MGSD ignores…
1. In real world’s social dilemmas are temporally extended
2. Cooperation and defection are labels that apply to polices implementing
strategic decision
3. Cooperativeness may be a graded quantity
4. Decision to cooperate or defect occur only quasi-simultaneously since some
information about what player 2 is starting to do can inform player 1’s decision
and vice versa
5. Decision must be made despite only having partial information about the
state of the world and the activities of the other players
Sequential Social Dilemma
SSD
= Markov Games +
Matrix Game Social
Dilemma
SSD – Markov Games
two-player partially observable Markov game : M => O : S x {1,2}
# O = { o_i | s, o_i }
Transition Function T : S x A_1 x A_2 -> delta(S) ( discrete probability distributions )
Reward Function r_i : S x A1 x A2
Policy π : O_i -> delta(A_i)
== Find MGSD with Reinforcement Learning
Value-state function
SSD – Definition of SSD
Sequential Social Dilemma
Empirical payoff matrix
Markov game에서 observation이 변함에 따라 policy가 변화
Learning Algorithm
== Deep Multiagent Reinforcement Learning
Use Deep Q-Network
Uniform Dist.
Simulation Method
Game : 2D grid-world
Observation : 3( RGB )
x 15(forehead) x 10(side)
Action :
8 ( arrow keys + rotate left + rotate right
+ use beam + stand )
Episode : 1000 step
NN : two Hidden layer – 32 unit
+ relu activation 8 output
Policy : e-greedy ( decrease e 1.0 to 0.1 )
Result – Gathering
Reward가 없지만… laser로 other agent를 잠깐 없앰
먹을게 (초록) 많으면 공존하면서 reward를 얻고,
적으면 서로 공격하기 시작함
Result – Gathering
Touch Green : reward +1 ( green removed temporally )
Beam to other player : (tagging)
hit twice, remove opponent from game N_tagged frames
Apple respawns after N_apple frames
=>
Defecting Policy == aggressive ( use beam )
Coopertive Policy == not seek to tag the other player
https://www.youtube.com/watch?v=F97lqqpcqsM
Result – Gathering
*After training for 4- million steps for each option
Conflict cost
Abundance
Highly Agressive
Low Agressive
RL to SSD
1. Train Policies at Different Game
2. Extract trained Policies from 1.
3. Calculate MGSD
4. Repeat 2-3 Until Converge
Gathering : DRL to SSD
Prisoner Dilemma
or
Non-SSD : ( NE is Global Optimal )
Wolfpack
함께 잡으면 더 높은 Reward
Wolfpack
r_team : reward when touch prey same
time
radius : capture radius ( collision size )
== difficulty of capture
Wolfpack SSD
Material Link
• https://arxiv.org/pdf/1702.03037.pdf
• https://deepmind.com/blog/understanding-agent-
cooperation/

Más contenido relacionado

La actualidad más candente

업적,칭호,타이틀 그게 뭐든간에...
업적,칭호,타이틀 그게 뭐든간에...업적,칭호,타이틀 그게 뭐든간에...
업적,칭호,타이틀 그게 뭐든간에...
SeungYeon Jeong
 
테일즈위버 합성 시스템 역 기획
테일즈위버 합성 시스템 역 기획테일즈위버 합성 시스템 역 기획
테일즈위버 합성 시스템 역 기획
Jun Geun Lee
 
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
tcaesvk
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and Programming
Sumit Jain
 
Action and Adventure Games
Action and Adventure GamesAction and Adventure Games
Action and Adventure Games
Zai Lekir
 

La actualidad más candente (20)

업적,칭호,타이틀 그게 뭐든간에...
업적,칭호,타이틀 그게 뭐든간에...업적,칭호,타이틀 그게 뭐든간에...
업적,칭호,타이틀 그게 뭐든간에...
 
테일즈위버 합성 시스템 역 기획
테일즈위버 합성 시스템 역 기획테일즈위버 합성 시스템 역 기획
테일즈위버 합성 시스템 역 기획
 
게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소
 
06. Game Architecture
06. Game Architecture06. Game Architecture
06. Game Architecture
 
NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개
 
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
 
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
 
02. 게임 전투 공식
02. 게임 전투 공식02. 게임 전투 공식
02. 게임 전투 공식
 
Game design document
Game design document Game design document
Game design document
 
검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기
 
게임 BM 설계를 위해 알아야 할 게임 Entity
게임 BM 설계를 위해 알아야 할 게임 Entity게임 BM 설계를 위해 알아야 할 게임 Entity
게임 BM 설계를 위해 알아야 할 게임 Entity
 
액션 게임 디자인
액션 게임 디자인액션 게임 디자인
액션 게임 디자인
 
레벨디자인 특강 이동훈
레벨디자인 특강 이동훈레벨디자인 특강 이동훈
레벨디자인 특강 이동훈
 
NDC 2010 이은석 - 마비노기 영웅전 포스트모템 2부
NDC 2010 이은석 - 마비노기 영웅전 포스트모템 2부NDC 2010 이은석 - 마비노기 영웅전 포스트모템 2부
NDC 2010 이은석 - 마비노기 영웅전 포스트모템 2부
 
Formal Game Elements - Players
Formal Game Elements - PlayersFormal Game Elements - Players
Formal Game Elements - Players
 
기획자의 포트폴리오는 어떻게 써야 할까
기획자의 포트폴리오는 어떻게 써야 할까기획자의 포트폴리오는 어떻게 써야 할까
기획자의 포트폴리오는 어떻게 써야 할까
 
Aft713 fundamental of game design 1.2
Aft713 fundamental of game design 1.2Aft713 fundamental of game design 1.2
Aft713 fundamental of game design 1.2
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and Programming
 
엄재민 Nhn과제 신규 게임 컨셉 제안서
엄재민 Nhn과제 신규 게임 컨셉 제안서엄재민 Nhn과제 신규 게임 컨셉 제안서
엄재민 Nhn과제 신규 게임 컨셉 제안서
 
Action and Adventure Games
Action and Adventure GamesAction and Adventure Games
Action and Adventure Games
 

Similar a Multi agent reinforcement learning for sequential social dilemmas

Similar a Multi agent reinforcement learning for sequential social dilemmas (8)

GAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNINGGAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNING
 
Game Theory Assignment
Game Theory AssignmentGame Theory Assignment
Game Theory Assignment
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
LAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural ElementsLAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural Elements
 
Game theory
Game theoryGame theory
Game theory
 
LAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityLAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and Accessability
 

Más de Dong Heon Cho

Más de Dong Heon Cho (20)

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
 
What is Texture.pdf
What is Texture.pdfWhat is Texture.pdf
What is Texture.pdf
 
BADGE
BADGEBADGE
BADGE
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
 
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learning
 
All about that pooling
All about that poolingAll about that pooling
All about that pooling
 
Background elimination review
Background elimination reviewBackground elimination review
Background elimination review
 
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GAN
 
Image matting atoc
Image matting atocImage matting atoc
Image matting atoc
 
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learning
 
Multi agent System
Multi agent SystemMulti agent System
Multi agent System
 
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architecture
 
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutes
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
LOL win prediction
LOL win predictionLOL win prediction
LOL win prediction
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation gan
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
Squeeeze models
Squeeeze modelsSqueeeze models
Squeeeze models
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Último (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Multi agent reinforcement learning for sequential social dilemmas

  • 1. Multi-agent RL in Sequential Social Dilemmas Paper Review
  • 2. MARL in SSD • Multi Agent Reinforcement Learning • Sequential Social Dilemmas => Understanding Agent Cooperation => In sequential situation ( mixed incentive sturcutre of matrix game social dilemma ) learn policies.
  • 4. Social Dilemma • A social dilemma is a situation in which an individual profits from selfishness unless everyone chooses the selfish alternative, in which case the whole group loses => Represent with Matrix game
  • 5. Matrix Game – prisoner’s dilemma Nash Equilibrium This is Best Choice.. in global perspective Betrayal Cooperate Matrix Game Social Dilemma == MGSD Rational agent choice this ( Think reward is - )
  • 6. MGSD ignores… 1. In real world’s social dilemmas are temporally extended 2. Cooperation and defection are labels that apply to polices implementing strategic decision 3. Cooperativeness may be a graded quantity 4. Decision to cooperate or defect occur only quasi-simultaneously since some information about what player 2 is starting to do can inform player 1’s decision and vice versa 5. Decision must be made despite only having partial information about the state of the world and the activities of the other players
  • 7. Sequential Social Dilemma SSD = Markov Games + Matrix Game Social Dilemma
  • 8. SSD – Markov Games two-player partially observable Markov game : M => O : S x {1,2} # O = { o_i | s, o_i } Transition Function T : S x A_1 x A_2 -> delta(S) ( discrete probability distributions ) Reward Function r_i : S x A1 x A2 Policy π : O_i -> delta(A_i) == Find MGSD with Reinforcement Learning Value-state function
  • 9. SSD – Definition of SSD Sequential Social Dilemma Empirical payoff matrix Markov game에서 observation이 변함에 따라 policy가 변화
  • 10. Learning Algorithm == Deep Multiagent Reinforcement Learning Use Deep Q-Network Uniform Dist.
  • 11. Simulation Method Game : 2D grid-world Observation : 3( RGB ) x 15(forehead) x 10(side) Action : 8 ( arrow keys + rotate left + rotate right + use beam + stand ) Episode : 1000 step NN : two Hidden layer – 32 unit + relu activation 8 output Policy : e-greedy ( decrease e 1.0 to 0.1 )
  • 12. Result – Gathering Reward가 없지만… laser로 other agent를 잠깐 없앰 먹을게 (초록) 많으면 공존하면서 reward를 얻고, 적으면 서로 공격하기 시작함
  • 13. Result – Gathering Touch Green : reward +1 ( green removed temporally ) Beam to other player : (tagging) hit twice, remove opponent from game N_tagged frames Apple respawns after N_apple frames => Defecting Policy == aggressive ( use beam ) Coopertive Policy == not seek to tag the other player https://www.youtube.com/watch?v=F97lqqpcqsM
  • 14. Result – Gathering *After training for 4- million steps for each option Conflict cost Abundance Highly Agressive Low Agressive
  • 15. RL to SSD 1. Train Policies at Different Game 2. Extract trained Policies from 1. 3. Calculate MGSD 4. Repeat 2-3 Until Converge
  • 16. Gathering : DRL to SSD Prisoner Dilemma or Non-SSD : ( NE is Global Optimal )
  • 18. Wolfpack r_team : reward when touch prey same time radius : capture radius ( collision size ) == difficulty of capture
  • 20. Material Link • https://arxiv.org/pdf/1702.03037.pdf • https://deepmind.com/blog/understanding-agent- cooperation/