Aprendizaje por Refuerzo: Luchas de Robots

Aprendizaje por Refuerzo Framework Aplicado a robots en RealTime Battle José Luis Marina Máster Investigación Informática Universidad Complutense de Madrid Obra Creative Commons Febrero de 2009 Aprendizaje Automático

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Problema genérico a resolver: “ Cómo un agente autónomo que siente y reacciona con su entorno, puede aprender a elegir acciones óptimas para la consecución de sus objetivos .” Machine Learning - Tom Mitchell Aprendizaje por refuerzo

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Aprendizaje por refuerzo

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Elegir política de control o acciones que maximizen: Aprendizaje por refuerzo s 0 s 1 s 2 a 2 a 1 a 0 r 0 r 1 r 2 r 0 + Ɣ r 1 + Ɣ²r 2 + ... donde 0 ≤ Ɣ < 1 ... Agente Entorno

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Aprendizaje por refuerzo

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],Aprendizaje por refuerzo

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Aprendizaje por refuerzo ,[object Object]

Introducción Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],Aprendizaje por refuerzo ,[object Object],[object Object],[object Object],[object Object],[object Object]

Real Time Battle Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Real Time Battle Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Trabajo Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Trabajo Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Trabajo Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Sensores, estados y Acciones X SEE_COOKIE – SEE_WALL - SEE_ROBOT No Lejos Cerca NUM_ROBOTS ENERGY UNDER_FIRE Poco Medio Mucho -> S 1 S 2 . S n n=3 8 (6561) 3 8 PATROL FIRE_AROUND BE_QUIET GO_FOR_COOKIES FIRE_CRAZY RAMBOW STOP ACELERAR GIRAR ROBOT GIRAR RADAR Y CAÑÓN FRENAR DISPARAR .... Acciones Básicas Acciones Elaboradas 7 Q(s,a) 3 8 x 7 45927 elementos NUMROBOTS = 0, SEE_ROBOT = 1, SEE_WALL = 2, SEE_COOKIE = 3, SEE_MINE = 4, SEE_BULLET = 5, MY_ENERGY = 6, UNDER_FIRE = 7, LEVEL_1 = 0 LEVEL_2 = 1 LEVEL_3 = 2

Trabajo Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Clase Qtable y Rebote MAIN: RTB_Rebote Rebote("Nombre","Color"); Rebote. run() ; Rebote.run() RTB_QTable Qt; Qt.open (num_actions , num_states ,init_val “ qtable_file",explore_rate ,learning); While (!end_game) get_sensor_values(); calculate_state(); reward = energy + 400 / (robots_left * robots_left); action = Qt.next_action(state, reward); execute_action(action);

Resultados Introducción Entorno Estrategia Conclusiones Futuro José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Introducción José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Al Azar Gamma = 1.0 Explore = 10% Learning Gamma = 0.8 Explore = 10% Gamma = 1.0 No Learning Introducción Entorno Estrategia Conclusiones Futuro

Datos de Resultados José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro ,[object Object],[object Object],[object Object],[object Object],[object Object]

Datos de Resultados José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro

Conclusiones sobre los resultados José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Acciones Futuras José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Principales Referencias José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro Machine Learning: C13 – Reiforcement Learning Tom Mitchell - 1997 RETALIATE: Learning Winning Policies in First-Person Shooter Games Megan Smith, Stephen Lee-Urban, Héctor Muñoz-Avila – 2007 Learning to be a Bot: Reiforcement Learning in Shooter Games. Michelle McPartland and Marcus Gallagher – 2008 Sutton, R. S. & Barto, A. G. Reinforcement Learning:An Introduction, MIT Press, Cambridge, MA - 1998 . Recognizing the Enemy: Combining RL with Strategy Selection using CBR. B. auslander, S. Lee-Urban, Chad Hogg and H. Muñoz - 2008 Real Time Battle: Web page and documentation. http://realtimebattle.sourceforge.net/ Imágenes de: http://www.sxc.hu

Fin José Luis Marina – Aprendizaje Automático - UCM Febrero de 2009 Reinforcement Learning Bots Introducción Entorno Estrategia Conclusiones Futuro ¿Preguntas? [email_address]

Aprendizaje por Refuerzo: Luchas de Robots

Recomendados

Recomendados

Más contenido relacionado

Similar a Aprendizaje por Refuerzo: Luchas de Robots

Similar a Aprendizaje por Refuerzo: Luchas de Robots (20)

Más de Joselu Marina

Más de Joselu Marina (15)

Último

Último (20)

Aprendizaje por Refuerzo: Luchas de Robots