AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018

AI and Minecraft
Lars Gregori
AMSTERDAM | MAY 8-9, 2018

Minecraft
Markus "Notch" Persson
Mojang AB
Best-selling PC game of all time
Exploration
Resource gathering
Crafting
Combat
Sandbox construction game
Creative + building aspects
Three-dimensional environment

Project Malmo
Open Source (Github)
Microsoft Research Lab
Based on
Minecraft / Minecraft Forge
Agents written in
Python, Lua, C++, C#,
Java, Torch, ALE*
Mission XML
WorldState
Send Command
*Arcade Learning Environment

“The Project Malmo platform is designed to
support a wide range of experimentation
needs and can support research in robotics,
computer vision, reinforcement learning,
planning, multi-agent systems, and related
areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence
Project Malmo

Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning

Observation Reward Action
Environment
Agent

“Reinforcement learning is like
trial-and-error learning.”David Silver

Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto 
(1998)
Cliff Walking Example
Reward:
-1 per move
100 blue field
-100 lava field

Reinforcement Learning Example

Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)

Q-Learning
old_q = 0.0

Q-Learning
old_q = 0.0
max_q = 0.0

Q-Learning
100  
-1
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)

Q-Learning
100  
-1
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)

Q-Learning
100  
-1
99.0
old_q = 0.0
max_q = 0.0
new_q = 99.0

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = q_table[prev_state][prev_action]

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)

Q-Learning
100  
-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)

Q-Learning
100  
-1
99.0
78.2
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = 78.2

[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 -2] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 48] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 -2] [-3 -3 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]

[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
ALPHA = 1.0 GAMMA = 0.8

[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1]
[ L L -2 -1] [-2 -2 L 83]
[ L -3 -3 74] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4]
[ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (40 moves)

[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1]
[ L L -2 -1] [-2 45 L 94]
[ L -3 -3 93] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4]
[ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (60 moves)

Deep Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning

Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602)
https://youtu.be/TmPfTpjtdgg

### based on arXiv:1312.5602 
### Playing Atari with Deep Reinforcement Learning (page 6) 
 
model = Sequential() 
model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape)) 
model.add(Activation('relu')) 
model.add(Conv2D(32, (4, 4), strides=(2, 2))) 
model.add(Activation(‘relu'))
model.add(Flatten()) 
model.add(Dense(256)) 
model.add(Activation('relu'))
model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions
model.compile(loss=‘categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Model (Keras)

Deep Reinforcement Learning Example

Thank you.
Contact information:
Lars Gregori
@choas
Hi Lars …

The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint
Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016
Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/
Project Malmo (Github) https://github.com/Microsoft/malmo
Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986 
2nd Version online
YouTube RL Course by David Silver
Links

AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018

Similar a AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018 (20)

Más de Codemotion

Más de Codemotion (20)

Último

Último (20)

AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018