SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
AI and Minecraft
Lars Gregori
AMSTERDAM | MAY 8-9, 2018
?
?
?
?
Minecraft
Minecraft
Markus "Notch" Persson
Mojang AB
Best-selling PC game of all time
Exploration
Resource gathering
Crafting
Combat
Sandbox construction game
Creative + building aspects
Three-dimensional environment
Project Malmo
Project Malmo
Open Source (Github)
Microsoft Research Lab
Based on
Minecraft / Minecraft Forge
Agents written in
Python, Lua, C++, C#,
Java, Torch, ALE*
Mission XML
WorldState
Send Command
*Arcade Learning Environment
“The Project Malmo platform is designed to
support a wide range of experimentation
needs and can support research in robotics,
computer vision, reinforcement learning,
planning, multi-agent systems, and related
areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence
Project Malmo
Reinforcement Learning
Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Reinforcement Learning
Observation Reward Action
Environment
Agent
“Reinforcement learning is like
trial-and-error learning.”David Silver
Reinforcement Learning
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto

(1998)
Reinforcement Learning
Cliff Walking Example
Reward:
-1 per move
100 blue field
-100 lava field
Reinforcement Learning Example
Q-Learning
Q-Learning
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
Q-Learning
100 

-1
99.0
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 99.0
Q-Learning
100 

-1
99.0
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
Q-Learning
100 

-1
99.0
78.2
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = 78.2
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 -2] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 48] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 -2] [-3 -3 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
ALPHA = 1.0 GAMMA = 0.8
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1]
[ L L -2 -1] [-2 -2 L 83]
[ L -3 -3 74] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4]
[ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (40 moves)
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1]
[ L L -2 -1] [-2 45 L 94]
[ L -3 -3 93] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4]
[ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (60 moves)
!
!
!
!
?
Deep Reinforcement Learning
Deep Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602)
https://youtu.be/TmPfTpjtdgg
12 Classes
### based on arXiv:1312.5602

### Playing Atari with Deep Reinforcement Learning (page 6)



model = Sequential()

model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))

model.add(Activation('relu'))

model.add(Conv2D(32, (4, 4), strides=(2, 2)))

model.add(Activation(‘relu'))
model.add(Flatten())

model.add(Dense(256))

model.add(Activation('relu'))
model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions
model.compile(loss=‘categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Model (Keras)
Deep Reinforcement Learning Example
Take-away
Thank you.
Contact information:
Lars Gregori
@choas
Hi Lars …
The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint
Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016
Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/
Project Malmo (Github) https://github.com/Microsoft/malmo
Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986

2nd Version online
YouTube RL Course by David Silver
Links

Más contenido relacionado

La actualidad más candente

Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-SortTareq Hasan
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisSNJ Chaudhary
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sortMadhu Bala
 
Sorting and searching
Sorting and searchingSorting and searching
Sorting and searchingkalyanineve
 
Data Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortData Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortManishPrajapati78
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sortKrish_ver2
 
Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Flynce Miguel
 
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEUnit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEDrkhanchanaR
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithnKumar
 
Lecture 5 6_7 - divide and conquer and method of solving recurrences
Lecture 5 6_7 - divide and conquer and method of solving recurrencesLecture 5 6_7 - divide and conquer and method of solving recurrences
Lecture 5 6_7 - divide and conquer and method of solving recurrencesjayavignesh86
 

La actualidad más candente (20)

Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-Sort
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And Analysis
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
 
Data Structure Sorting
Data Structure SortingData Structure Sorting
Data Structure Sorting
 
Sorting and searching
Sorting and searchingSorting and searching
Sorting and searching
 
Data Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortData Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge Sort
 
3.8 quicksort
3.8 quicksort3.8 quicksort
3.8 quicksort
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sort
 
Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)
 
Merge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering studentsMerge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering students
 
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEUnit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
 
Merge sort
Merge sortMerge sort
Merge sort
 
Merge sort algorithm power point presentation
Merge sort algorithm power point presentationMerge sort algorithm power point presentation
Merge sort algorithm power point presentation
 
State feedback example
State feedback exampleState feedback example
State feedback example
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithn
 
Unit 7 sorting
Unit 7   sortingUnit 7   sorting
Unit 7 sorting
 
Mmclass9
Mmclass9Mmclass9
Mmclass9
 
Lecture 5 6_7 - divide and conquer and method of solving recurrences
Lecture 5 6_7 - divide and conquer and method of solving recurrencesLecture 5 6_7 - divide and conquer and method of solving recurrences
Lecture 5 6_7 - divide and conquer and method of solving recurrences
 
Sorting
SortingSorting
Sorting
 
Sorting
SortingSorting
Sorting
 

Similar a AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018

[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und MinecraftLars Gregori
 
The LCA problem revisited
The LCA problem revisitedThe LCA problem revisited
The LCA problem revisitedMinsung Hong
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4thsesejun
 
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABMathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABCOMSATS Abbottabad
 
PRE: Datamining 2nd R
PRE: Datamining 2nd RPRE: Datamining 2nd R
PRE: Datamining 2nd Rsesejun
 
Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1stsesejun
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1stsesejun
 
06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuasHipólito Aguilar
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpyFaraz Ahmed
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki ShiokawaSuurist
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosMárcio Martins
 
‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptxRamiHarrathi1
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualLewisSimmonss
 
Datamining R 4th
Datamining R 4thDatamining R 4th
Datamining R 4thsesejun
 
Introduction to MatLab programming
Introduction to MatLab programmingIntroduction to MatLab programming
Introduction to MatLab programmingDamian T. Gordon
 

Similar a AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018 (20)

[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und Minecraft
 
The LCA problem revisited
The LCA problem revisitedThe LCA problem revisited
The LCA problem revisited
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4th
 
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABMathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
 
PRE: Datamining 2nd R
PRE: Datamining 2nd RPRE: Datamining 2nd R
PRE: Datamining 2nd R
 
Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1st
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1st
 
06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 
4.1 matrices
4.1 matrices4.1 matrices
4.1 matrices
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
 
Oil Prices Data Analysis - R
Oil Prices Data Analysis - ROil Prices Data Analysis - R
Oil Prices Data Analysis - R
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitados
 
quick and merge.pptx
quick and merge.pptxquick and merge.pptx
quick and merge.pptx
 
‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
 
Counting sort
Counting sortCounting sort
Counting sort
 
Datamining R 4th
Datamining R 4thDatamining R 4th
Datamining R 4th
 
Residues in MATLAB
Residues in MATLABResidues in MATLAB
Residues in MATLAB
 
Introduction to MatLab programming
Introduction to MatLab programmingIntroduction to MatLab programming
Introduction to MatLab programming
 

Más de Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Más de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

AI and Minecraft - Lars Gregori - Codemotion Amsterdam 2018

  • 1. AI and Minecraft Lars Gregori AMSTERDAM | MAY 8-9, 2018
  • 4. Minecraft Markus "Notch" Persson Mojang AB Best-selling PC game of all time Exploration Resource gathering Crafting Combat Sandbox construction game Creative + building aspects Three-dimensional environment
  • 6. Project Malmo Open Source (Github) Microsoft Research Lab Based on Minecraft / Minecraft Forge Agents written in Python, Lua, C++, C#, Java, Torch, ALE* Mission XML WorldState Send Command *Arcade Learning Environment
  • 7. “The Project Malmo platform is designed to support a wide range of experimentation needs and can support research in robotics, computer vision, reinforcement learning, planning, multi-agent systems, and related areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence Project Malmo
  • 10. Reinforcement Learning Observation Reward Action Environment Agent
  • 11. “Reinforcement learning is like trial-and-error learning.”David Silver Reinforcement Learning
  • 12. Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto
 (1998) Reinforcement Learning Cliff Walking Example Reward: -1 per move 100 blue field -100 lava field
  • 16. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 18. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 19. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 20. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 21. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
  • 22. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 23. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 24. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 99.0
  • 26. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 27. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 28. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 29. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
  • 30. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
  • 31. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
  • 32. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
  • 33. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
  • 34. Q-Learning 100 
 -1 99.0 78.2 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = 78.2
  • 35. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 36. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 37. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 -2] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 38. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 48] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 39. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 -2] [-3 -3 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 40. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 41. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 42. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 43. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 44. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 45. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] ALPHA = 1.0 GAMMA = 0.8
  • 46. [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1] [ L L -2 -1] [-2 -2 L 83] [ L -3 -3 74] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4] [ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (40 moves)
  • 47. [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1] [ L L -2 -1] [-2 45 L 94] [ L -3 -3 93] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4] [ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (60 moves)
  • 49. ?
  • 52. Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602) https://youtu.be/TmPfTpjtdgg
  • 53.
  • 54.
  • 56. ### based on arXiv:1312.5602
 ### Playing Atari with Deep Reinforcement Learning (page 6)
 
 model = Sequential()
 model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))
 model.add(Activation('relu'))
 model.add(Conv2D(32, (4, 4), strides=(2, 2)))
 model.add(Activation(‘relu')) model.add(Flatten())
 model.add(Dense(256))
 model.add(Activation('relu')) model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions model.compile(loss=‘categorical_crossentropy',optimizer='adam',metrics=['accuracy']) Model (Keras)
  • 59. Thank you. Contact information: Lars Gregori @choas Hi Lars …
  • 60. The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016 Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/ Project Malmo (Github) https://github.com/Microsoft/malmo Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986
 2nd Version online YouTube RL Course by David Silver Links