Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Bringing Personal Robots Home [S9360]
Integrating Computer Vision & Human–Robot
Interaction for Real-World Applications
NV...
Requirements for Robots
Industrial Personal
Cost high low
Environment fixed, known,
structured
dynamic,
unstructured,
unse...
Requirements for Robots
Industrial Personal Key technology
Cost high low hardware
Environment fixed, known,
structured
dyn...
Requirements for Robots
Industrial Personal Key technology
Cost high low hardware
Environment fixed, known,
structured
dyn...
A variety of real-world
environments
PR1: Wyrobek et al. 2008
Key Technologies
● Computer Vision: Generalization to different environments and tasks
○ Object detection of thousands of ...
Two Projects
● Interactive picking robot
● Autonomous tidying-up robot
Interactively Picking Real-World Objects
https://projects.preferred.jp/interactive-robot/
Challenges
● Variety of Expressions
“a bear doll”, “the animal plushie”,
“that fluffy thing”, “up-side-down grizzly”
“grab...
Human: the one next to the eraser box.
Robot: I got it.
Human: hey can you move that brown fluffy
thing to the bottom righ...
Proposed Model
embedding
MLP
speech (transcription)
CNN (+feat.)
MLP
cropped images
!pick the brown fluffy thing and put
i...
Handling Ambiguous Commands
● Trained with hinge loss for correct sentence–object pairs [Yu+ 2017]
● Instruction is consid...
Interactive Picking Dataset
grab the human face
labeled object and …
move the pop red can
from the top …
move the pink hor...
Results
single instruction
88.0%
Accuracy of target object matching
Results
4.7% improvement (39% error reduction) by interactive clarification
single instruction interactive
88.0% 92.7%
Acc...
Summary
● We proposed an interactive picking system that can be controlled by
unconstrained spoken language instructions.
...
Tidying-up Robot
https://projects.preferred.jp/tidying-up-robot/
CEATEC JAPAN 2018 (Oct 16–19, 2018)
Environment
● Furnished living room
○ Coffee table, coach,
bookshelf, trash bins,
laundry bag, toy box
● Two Toyota HSRs w...
Object Recognition
● Sensors
○ HSR’s head camera (RGBD)
○ 4 ceiling cameras (RGB)
● Supported objects: ~300
● PFDet as CNN...
PFDet: Basic Architecture [1]
● Feature Pyramid Network (FPN) (SENet-154 and SE-ResNeXt-101)
● Multi-node batch normalizat...
PFDet: High Scalability
Hardware: In-house GPU Cluster
NVIDIA Tesla V100
(32GB) × 512
Infiniband
Scalability Results
● Tra...
Data Collection
System Performance
● Object detection
○ Accuracy: 0.90 mIoU (segmentation mask)
● Robot system (actual measurement at CEAT...
Robustness of Object Detection
Sparse Dense
Typical Errors
Mango vs. lemonMis-recognition on
humans Whiteout
False negative in clutter
34
Human–Robot Interaction (HRI)
● From user to robot
○ Update where the current item should be stored
○ Inquire about object...
Needs English subtitles
Needs English subtitles
Tablet UI
Remaining Challenges with Tidying-up
● Standalone computation (no external sensor or computer)
● Recognition of unlimited ...
Robots as Interface with Physical World
● Domestic robots can track household items while tidying-up,
connecting everythin...
Key Takeaways
● Robust computer vision and intuitive human–robot interface are
prerequisites for successful personal robot...
Thank you!
Interactive picking: https://pfnet.github.io/interactive-robot/
Tidying-up robot: https://projects.preferred.jp...
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications
Próxima SlideShare
Cargando en…5
×

[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications

1.018 visualizaciones

Publicado el

In this talk, we’ll discuss our latest achievements and challenges in developing personal robot systems. The main focus of the talk is on an autonomous tidying-up robot system, which we have recently announced. We’ll describe how we integrated cutting-edge speech and natural language processing and computer vision technologies to build such an autonomous system that can work on complex real-world applications with high accuracy. The system also deploys our latest object detection model, which was trained using 512 NVIDIA Tesla V100 GPUs and won second prize in the Google AI Open Images - Object Detection Track in August 2018.

Interactive picking: https://pfnet.github.io/interactive-robot/
Tidying-up robot: https://projects.preferred.jp/tidying-up-robot/en/

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

[GTC 2019] Bringing Personal Robots Home: Integrating Computer Vision and Human–Robot Interaction for Real-World Applications

  1. 1. Bringing Personal Robots Home [S9360] Integrating Computer Vision & Human–Robot Interaction for Real-World Applications NVIDIA GTC 2019 (Mar 18, 2019) Jun Hatori, Preferred Networks
  2. 2. Requirements for Robots Industrial Personal Cost high low Environment fixed, known, structured dynamic, unstructured, unseen Users experts non-experts Goal automation intelligence, personalization
  3. 3. Requirements for Robots Industrial Personal Key technology Cost high low hardware Environment fixed, known, structured dynamic, unstructured, unseen computer vision Users experts non-experts human–robot interaction Goal automation intelligence, personalization task planning
  4. 4. Requirements for Robots Industrial Personal Key technology Cost high low hardware Environment fixed, known, structured dynamic, unstructured, unseen computer vision Users experts non-experts human–robot interaction Goal automation intelligence, personalization task planning
  5. 5. A variety of real-world environments
  6. 6. PR1: Wyrobek et al. 2008
  7. 7. Key Technologies ● Computer Vision: Generalization to different environments and tasks ○ Object detection of thousands of categories ○ Support unseen environments and unseen objects ● Human–robot interface between humans and robots ○ Intuitive interface with spoken and visual language interpretation ○ Spoken and visual feedback from robots
  8. 8. Two Projects ● Interactive picking robot ● Autonomous tidying-up robot
  9. 9. Interactively Picking Real-World Objects https://projects.preferred.jp/interactive-robot/
  10. 10. Challenges ● Variety of Expressions “a bear doll”, “the animal plushie”, “that fluffy thing”, “up-side-down grizzly” “grab X”, “bring together X and Y”, “move X to a diagonal box” ● Ambiguity and errors “that brown one”, “a dog doll?”
  11. 11. Human: the one next to the eraser box. Robot: I got it. Human: hey can you move that brown fluffy thing to the bottom right? Robot: which one do you mean?
  12. 12. Proposed Model embedding MLP speech (transcription) CNN (+feat.) MLP cropped images !pick the brown fluffy thing and put in the lower bin. embedding LSTM vision (RGB) SSD Destination LSTM MLP Target Obj.
  13. 13. Handling Ambiguous Commands ● Trained with hinge loss for correct sentence–object pairs [Yu+ 2017] ● Instruction is considered ambiguous if margin is below threshold CNN MLP CNN MLPMLP LSTM !pick the brown fluffy thing and put it in the lower right bin. 2nd 1st margin
  14. 14. Interactive Picking Dataset grab the human face labeled object and … move the pop red can from the top … move the pink horse plushie … put the box with a 50 written on it that is … Publicly available as PFN-PIC dataset: https://github.com/pfnet-research/picking-instruction 1200 scenes (26k objects in total) 100 types of commodities unconstrained 73k instructions (vocabulary size: 5000)
  15. 15. Results single instruction 88.0% Accuracy of target object matching
  16. 16. Results 4.7% improvement (39% error reduction) by interactive clarification single instruction interactive 88.0% 92.7% Accuracy of target object matching
  17. 17. Summary ● We proposed an interactive picking system that can be controlled by unconstrained spoken language instructions. ● We achieved an object matching accuracy of 92.7%. ● Accuracies for unseen objects are not sufficient (~70%). * Hatori+ 2018. Interactively Picking Real-World Object with Unconstrained Spoken Language Instructions. ICRA-2018 Best Paper on HRI.
  18. 18. Tidying-up Robot https://projects.preferred.jp/tidying-up-robot/
  19. 19. CEATEC JAPAN 2018 (Oct 16–19, 2018)
  20. 20. Environment ● Furnished living room ○ Coffee table, coach, bookshelf, trash bins, laundry bag, toy box ● Two Toyota HSRs working in parallel
  21. 21. Object Recognition ● Sensors ○ HSR’s head camera (RGBD) ○ 4 ceiling cameras (RGB) ● Supported objects: ~300 ● PFDet as CNN base model ○ 2nd place accuracy at Google AI Open Images Challenge – Object Detection (Sep, 2018)
  22. 22. PFDet: Basic Architecture [1] ● Feature Pyramid Network (FPN) (SENet-154 and SE-ResNeXt-101) ● Multi-node batch normalization ● Non-maximum weighted (NMW) suppression [2] ● Global context ○ Additional FPN block ○ PSP (pyramid spatial pooling) module ○ Context head [3] [1] Akiba+ 2018. PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track. [2] Zhou+. CAD: Scale invariant framework for real-time object detection. ICCVW 2017. [3] Zhu+. CoupleNet: Coupling global structure with local parts for object detection. ICCV 2017.
  23. 23. PFDet: High Scalability Hardware: In-house GPU Cluster NVIDIA Tesla V100 (32GB) × 512 Infiniband Scalability Results ● Training of 16 epochs completed in 33 hours ● Scaling efficiency is 83% compared to 8 GPUs Software Framework
  24. 24. Data Collection
  25. 25. System Performance ● Object detection ○ Accuracy: 0.90 mIoU (segmentation mask) ● Robot system (actual measurement at CEATEC) ○ Tidying-up Speed: 1.9 object / minute ○ Grasp success rate: ~90%
  26. 26. Robustness of Object Detection Sparse Dense
  27. 27. Typical Errors Mango vs. lemonMis-recognition on humans Whiteout False negative in clutter
  28. 28. 34
  29. 29. Human–Robot Interaction (HRI) ● From user to robot ○ Update where the current item should be stored ○ Inquire about object locations ● From robot to user ○ Spoken and audio feedback ○ Tablet App for monitoring ■ User can also provide feedback ■ AR-based visualization ● Technologies involved: speech recognition, NLP, gesture, AR
  30. 30. Needs English subtitles
  31. 31. Needs English subtitles
  32. 32. Tablet UI
  33. 33. Remaining Challenges with Tidying-up ● Standalone computation (no external sensor or computer) ● Recognition of unlimited items in domestic environments ● Generalization to unseen environments ● Easy setup
  34. 34. Robots as Interface with Physical World ● Domestic robots can track household items while tidying-up, connecting everything in physical world to the virtual world. ● Potential applications: ○ E-commerce ○ Recommendations on items purchase or disposal
  35. 35. Key Takeaways ● Robust computer vision and intuitive human–robot interface are prerequisites for successful personal robot applications. ● Some of simple domestic tasks like tidying-up are getting close to a production level. ● Robots are interface with physical world, computerizing household items and connect them to online services.
  36. 36. Thank you! Interactive picking: https://pfnet.github.io/interactive-robot/ Tidying-up robot: https://projects.preferred.jp/tidying-up-robot/en/ Related talks ● S9380 - The Frontier of Define-by-Run Deep Learning Frameworks Wed, Mar 20, 11:00 AM - 11:50 AM – SJCC Room 210E ● S9738 - Using GPU Power for NumPy-syntax Calculations Tue, Mar 19, 2:00 PM - 02:50 PM – SJCC Room 210F

×