2. World models
Background: Reinforcement Learning
World Model Architecture
View
Model
Controller
EXPERIMENT 1- OPENAI GYM – Car Racing V0
Ablation Study
Experiment 2 – Vizdoom
Training Policy From “World Model Dream"
9. Interacting Between Planning Acting And Learning
Value/Polic
y
ExperienceModel
Planning acting
Model Learning
Direct RL
You Are Here
Left Brain
10. CONTRIBUTIONS
Models the environment using an unsupervised, low
dimensional representation
Recurrent mixture model, models the agents actions on the
environment stochastically, which helps the controller
anticipate the next move.
RL is used on the model policy, which is transferable to the
“real” environment
17. Z Latent Space
• If real images are a subset of the entire space of images then in theory, we should be able to encode them
with a smaller amount of information than the large space
• This smaller “space” of variables is called the latent space “z”
19. Some Examples Of Latent Spaces
Z examples
Http://vecg.Cs.Ucl.Ac.Uk/projects/projects_fonts/projects_fonts.Html
Https://worldmodels.Github.Io/
20. So What Is The Use Of Z ?
• Z is a smaller space, so reduces the “state space” the model needs to deal with
• Z-values contain should more meaningful information
• Z-values, trained with enough data, should generalize to novel, unseen yet similar environments
36. Experiment 1 CAR Race - ABLATION
SO DOES THIS WORK?
TO TEST, OPENAI GYM CAR-RACE ENVIRONMENT WAS USED
MODEL TOP-SCORED THE LEADERBOARD
MODEL WAS RUN WITHOUT THE M COMPENENT AND WITH THE M COMPONENT
39. Experiment 2 – The Advantage Of Dreams
• NOTE THAT V AND M ARE TRAINED COMPLETELY BY UNSUPERVISED LEARNING
USING A RANDOM POLICY
• AFTER TRAINING M EFFECTIVELY BECOMES A “SIMULATION” OF THE REAL
ENVIRONMENT
• IF WE TRAIN A POLICY USING M, WILL IT WORK ON THE REAL ENVIRONMENT?
40.
41. SO WHY USE M AND NOT THE REAL
ENVIRONMENT?
• M RUNS FASTER BECAUSE
• THE DIMENSIONALITY IS REDUCED TO Z
• ITS VECTORIZED AND THEREFORE OPTIMIZED FOR HARDWARE
ACCELERATION
• M IS NOT DETERMINISTIC, AND HAS A “TEMPERATURE”
PARAMETER
43. RANDOMNESS TO REDUCING
• GAMING A SYSTEM GENERALLY RELIES UPON EXPLOITING ”EDGE CASES”
• ADDING “RANDOMNESS” TO A SIMULATION REDUCES THE RELIABILITY OF EDGE
CASES AND MAKES THE UNDERLYING SIMULATION
• IT ALSO CAUSES THE POLICY TO BECOME MORE REDUNDANT AND ROBUST
• HTTPS://BLOG.OPENAI.COM/GENERALIZING-FROM-SIMULATION/
• M COMES WITH A “RANDOMNESS” SLIDER BUILT IN!
• HTTPS://WORLDMODELS.GITHUB.IO
44. Further the discussion
Have A Longer Conversation On The State Of AI/ML Join Us On A Free Demo
http://training.leftbrain.consulting/
Gain Hands On Experience Programing From Publications
Left Brain