Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Deep parking

Internship report at Preferred Networks, Inc.

  • Inicia sesión para ver los comentarios

Deep parking

  1. 1. Deep parking: an implementation of automatic parking with deep reinforcement learning Shintaro Shiba, Feb.2016-Dec.2016 Engineer Internship at Preferred Networks Mentor: Abe-san, Fujita-san 1
  2. 2. About me Shintaro Shiba • Graduate student at the University of Tokyo – Major in neuroscience and animal behavior • Part-time engineer (internship) at Preferred Networks, Inc. – Blog post URL: https://research.preferred.jp/2017/03/deep- parking/ 2
  3. 3. Contents • Original Idea • Background: DQN and Double-DQN • Task definition – Environment: car simulator – Agents 1. Coordinate 2. Bird‘s-eye view 3. Subjective view • Discussion • Summary 3
  4. 4. Achievement Trajectory of the car agent Subjective view (Input for DQN) 0 deg -120 deg +120 deg 4
  5. 5. Original Idea: DQN for parking https://research.preferred.jp/2016/01/ces2016/ https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/ Succeeded in driving smoothly with DQN Input: 32 virtual sensors, 3 previous actions + Current speed and steering Output: 9 actions Is it possible to learn for car agent to park itself, with inputs of images from camera? 5
  6. 6. Reinforcement learning Environment Agent action state reward Learning algorithm 6
  7. 7. DQN: Deep-Q Network Volodymyr Mnih et al. 2015 each episode >> each action >> update Q function >> 7
  8. 8. Double DQN Preventing overestimation of Q values Hado van Hasselt et al. 2015 8
  9. 9. Reinforcement learning in this project Environment Car simulator Agent Different sensor + different neural network action state = sensor input reward 9
  10. 10. Environment: Car simulator Forces of … • Traction • Air resistance • Rolling resistance • Centrifugal force • Brake • Cornering force F = Ftraction + Faero + Frr + Fc + Fbrake + Fcf 10
  11. 11. Common specifications: state, action, reward Input (States) – Features specific to each agent + car speed, car steering Output (Actions) – 9: accelerate, decelerate, steer right, steer left, throw (do nothing), accelerate + steer right, accelerate + steer left, decelerate + steer right, decelerate + steer left Reward – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise (changed afterward) Goal – Car inside the goal region, no other conditions like car direction Terminate – Time up: 500 times of actions (changed to 450 afterward) – Field out: Out of the field 11
  12. 12. Common specifications: hyperparameters Maximum episode: 50,000 Gamma: 0.97 Optimizer: RMSpropGraves – lr=0.00015, alpha=0.95, momentum=0.95, eps=0.01 – changed afterward: lr=0.00015, alpha=0.95, momentum=0, eps=0.01 Batchsize: 50 or 64 Epsilon: 0.1 at last – linearly decreased from 1.0 at first 12
  13. 13. Agents 1. Coordinate 2. Bird’s-eye view 3. Subjective view – Three cameras – Four cameras 13
  14. 14. Coordinate agent Input features – Relative coordinate value from the car to the goal (80, 300) goal car 14 input shape: (2, ) normalized
  15. 15. Coordinate agent Neural Network – only full-connected layers (3) n of actions (9) n of car parameters (2) coordinates (2) 64 64 15
  16. 16. Coordinate agent Result 16
  17. 17. Bird’s-eye view agent Input features – Bird’s-eye image of the whole field input size: 80 x 80 normalized 17
  18. 18. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 18 Conv
  19. 19. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 19 Conv
  20. 20. Bird’s-eye view agent Result: 18k episodes 20
  21. 21. Bird’s-eye view agent Result: after 18k episodes ? But we had already spent about 6 month for this agent so moved to the next…21
  22. 22. Subjective view agent Input features – N_of_camera images of subjective view from the car – Number of cameras…Three or Four – FoV = 120 deg camera ex. Input images for four camera agent front +0 back +180 right +90 left +270 22
  23. 23. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 23
  24. 24. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 24
  25. 25. Subjective view agent Problem – Calculation time (GeForce GTX TITAN X) • At first… 3 [min/ep] x 50k [ep] = 100 days • Reviewed by Abe-san… 1.6 [min/ep] x 50k [ep] = 55 days – Because of copy and synchronization between GPU and CPU – Learning interrupted as soon as divergence of DNN output – (Fortunately) agent “learned” goal by ~10k episodes in some trials – Memory usage • In DQN, we need to store 1M previous input data – 1M x (80 x 80 x 3 ch x 4 cameras) • Save images to disk and access every time 25
  26. 26. Subjective view agent Result: three cameras, 6k episodes 0 deg -120 deg +120 deg Trajectory of the car agent Subjective view (Input for DQN) 26
  27. 27. Subjective view agent Result: three cameras, 50k episodes The policy “move anyways” ? >> Reward setting Seems not able to goal every time Only “easy” goal to achieve >> Variable task difficulty (curriculum Frequent goals here 27
  28. 28. Subjective view agent Four camera at 30k ep. 28
  29. 29. Modify reward Previous – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise New – +1 - speed when the car is in the goal • in order to stop the car – -1 when the car is out of the field – -0.005 29
  30. 30. Modify difficulty Difficulty: Initial car direction & position – Constraint • Car always starts near the middle of the field • Car always starts with face toward center: – Curriculum • Car direction: – where n = currriculum • Criteria: – 0.6 of mean reward over 100 episodes ± p 12 n ± p 4 Goal n = 1 n = 2 30
  31. 31. Subjective view agent: modifications N cameras Reward Difficulty Learning result 3 Default Default about 6k: o 50k: x 3 modified Default about 16k: o 3 modified Constraint ? (still learning) 3 modified Curriculum o (though curriculum 1 yet) 4 Default Default x 4 modified Curriculum △ (not bad, but not successful yet at 6k) 31
  32. 32. Subjective view agent: modifications Curriculum + Three cameras @curriculum 1. Criteria needs to be modified reward mean reward sum 1.0 0.0 500 0 n episode 0 10k 20k n episode 0 10k 20k 32
  33. 33. Discussion 1. Initial settings included the situation where car cannot reach the goal – e.g. Start towards the edge of the field – This made learning unstable 2. Why successful for coordinate agent? – In spite there could be such situations? 33
  34. 34. Discussion 3. Comparison with three and four cameras – Considering success rate and execution time, three camera is better – Why not successful in four cameras? – Need several trials? 4. DQN often diverged – every three times in personal feeling • four cameras is slightly more oftern – Importance of dataset for learning • memory size, batch size 34
  35. 35. Discussion 5. Curriculum – Ideally better to quantify “difficulty of the task” • In this case, maybe it is roughly represented as “bias of distribution” of the selected actions? accelerate decelerate throw (do nothing) steer right steer left accelerate + steer right accelerate + steer left decelerate + steer right decelerate + steer left same times for each actions >> go straight biased distribution of selected actions >> go right/lef 35
  36. 36. Summary • Car agent can park itself with subjective view of cameras, though not always stable learning • Trade-off between reward design and learning difficulty – Simple reward: difficult to learn • Try other algorithms like A3C – Complex reward: difficult to set • Other setting for distance_to_goal 36

×