DETR ECCV20

•Descargar como PPTX, PDF•

1 recomendación•115 vistas

Mengmeng Xu

End-to-End Object Detection with Transformers. This is not the official slides. Presented in the group meeting.

Ingeniería

From Transformer to Detection:
End-to-End Object Detection with
Transformers
Presented by Frost
IVUL@KAUST

Content
• What is DETR?
• DETR pipeline
• Background: Transformer
• Transformer in DETR
• Set-Based global loss
• Main Results
• Conclusion
• Ideas
• References

What is DETR?
• Approaches object detection as a direct set prediction problem.
• Transformer encoder-decoder architecture.
• Set-based global loss: forces unique predictions via bipartite matching.
• Due to this parallel nature, DETR is very fast and efficient. (in testing
time)

Background: Transformer
• Sequence to Sequence
• Encoder-Decoder

Background: Transformer
• Sequence to Sequence
• Encoder-Decoder
• Encoder: Self-Attention + FFN

Background: Transformer
• Sequence to Sequence
• Encoder-Decoder
• Encoder: Self-Attention + FFN
• Decoder: Self-Attention +
Encoder-Decoder Attention +
FFN

Background: Encoder-Decoder-Attention
Y
Y
Encoder
Encoder

Background: Transformer
• Sequence to Sequence
• Encoder-Decoder
• Encoder: Self-Attention + FFN
• Decoder: ... + E-D Att + ...
• Multi-Headed Attention

Background: Transformer
• Sequence to Sequence
• Encoder-Decoder
• Encoder: Self-Attention + FFN
• Decoder: ... + E-D Att + ...
• Multi-Headed Attention
• Positional Encoding

Transformer in DETR
1. positional encodings passed
at every attention layer
2. queries are initially set to
zero

Set-based global loss
a permutation of N elements
predictions
Maximize iou
bipartite matching

Set-based global loss
a permutation of N elements
predictions
Maximize iou

Conclusion
• A fresh design for object detection systems based on transformers
and bipartite matching loss for direct set prediction.
• Significantly better performance on large objects than Faster R-CNN,
• Global information performed by the self-attention.
• It under performs on a smaller objects compared to other object
detectors of same magnitude.
• It takes long training hours and is not real time.
• The transformer architecture leads to significant overhead in
training/inference

Ideas
• Detection in 3D
• Temporal Action Localization
• Segmentation / Super-resolution
• ...
• ...
• ...
• ...
• ...

References
• DETR Paper
• http://jalammar.github.io/illustrated-transformer/
• https://ai.facebook.com/blog/end-to-end-object-detection-with-
transformers/
• https://medium.com/inside-machine-learning/what-is-a-transformer-
d07dd1fbec04
• https://www.geeksforgeeks.org/maximum-bipartite-matching/
• https://medium.com/visionwizard/detr-b677c7016a47

Más contenido relacionado

La actualidad más candente

Learning spatiotemporal features with 3 d convolutional networksSungminYou

Modern face recognition with deep learningmarada0033

PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee

MATLAB Code + Description : Real-Time Object Motion Detection and TrackingAhmed Gad

Generative adversarial network and its applications to speech signal and natu...宏毅李

You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV

Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA

Gradient-Based Low-Light Image EnhancementMasayuki Tanaka

Transformer in Computer VisionDongmin Choi

Faster R-CNN - PR012Jinwon Lee

Understanding cnnRucha Gole

Deep learning on face recognition (use case, development and risk)Herman Kurnadi

Deep neural networksSi Haem

Yol ov2Bang Tsui Liou

IRJET- Object Detection and Recognition for Blind AssistanceIRJET Journal

Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN

Yolov5 Hochschule Bonn-Rhein-Sieg

[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon

Advanced deep learning based object detection methodsBrodmann17

Deep learning for object detectionWenjing Chen

La actualidad más candente (20)

Learning spatiotemporal features with 3 d convolutional networks

Modern face recognition with deep learning

PR-355: Masked Autoencoders Are Scalable Vision Learners

MATLAB Code + Description : Real-Time Object Motion Detection and Tracking

Generative adversarial network and its applications to speech signal and natu...

You Only Look Once: Unified, Real-Time Object Detection

Demystifying NLP Transformers: Understanding the Power and Architecture behin...

Gradient-Based Low-Light Image Enhancement

Transformer in Computer Vision

Faster R-CNN - PR012

Understanding cnn

Deep learning on face recognition (use case, development and risk)

Deep neural networks

Yol ov2

IRJET- Object Detection and Recognition for Blind Assistance

Convolutional Neural Network (CNN) - image recognition

Yolov5

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection

Advanced deep learning based object detection methods

Deep learning for object detection

Similar a DETR ECCV20

2015_04_13_CDR FINAL REVISIONJoel Nielsen

EE 330 Lect 3 Spring 2022.pdfPatriciaTutuani1

adaptive_ecg_cdr_edittedforpublic.pptxssuser6f1a8e1

IMPLEMENTATION OF DYNAMIC REMOTE OPERATED USING BAT ALGORITHMNAVIGATION EQUIP...AlameluPriyadharshini

Overview of DuraMat software tool developmentAnubhav Jain

Online video object segmentation via convolutional trident networkNAVER Engineering

l1_introduction.pdfDumith Jayathilaka

es_hardware_handoutMohammad Ranjbar

PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routingPROIDEA

ROBOTICS - Introduction to Robotics MicrocontrollerVibrant Technologies & Computers

Layout design on MICROWINDvaibhav jindal

Sista: Improving Cog’s JIT performanceESUG

Julianna Ricci Portfolio 2016Julianna Ricci

Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu

IBIS MODELING FOR WIDEBAND EMC APPLICATIONSPiero Belforte

Application of machine learning and cognitive computing in intrusion detectio...Mahdi Hosseini Moghaddam

Building Big Data Streaming ArchitecturesDavid Martínez Rego

Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent

Big Data Visualization With ParaViewSwiss Big Data User Group

Line following robotarul jothi

Similar a DETR ECCV20 (20)

2015_04_13_CDR FINAL REVISION

EE 330 Lect 3 Spring 2022.pdf

adaptive_ecg_cdr_edittedforpublic.pptx

IMPLEMENTATION OF DYNAMIC REMOTE OPERATED USING BAT ALGORITHMNAVIGATION EQUIP...

Overview of DuraMat software tool development

Online video object segmentation via convolutional trident network

l1_introduction.pdf

es_hardware_handout

PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing

ROBOTICS - Introduction to Robotics Microcontroller

Layout design on MICROWIND

Sista: Improving Cog’s JIT performance

Julianna Ricci Portfolio 2016

Object Detection Beyond Mask R-CNN and RetinaNet I

IBIS MODELING FOR WIDEBAND EMC APPLICATIONS

Application of machine learning and cognitive computing in intrusion detectio...

Building Big Data Streaming Architectures

Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...

Big Data Visualization With ParaView

Line following robot

Más de Mengmeng Xu

Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talkMengmeng Xu

Graph-Based Global Reasoning NetworksMengmeng Xu

Making figures on pptMengmeng Xu

What Makes Training Multi-modal Classification Networks Hard?Mengmeng Xu

G-TAD: Sub-Graph Localization for Temporal Action DetectionMengmeng Xu

A flexible model for training action localization with varying levels of su...Mengmeng Xu

Más de Mengmeng Xu (6)

Boundary-sensitive Pre-training for Temporal Localization in Videos cvpr21-talk

Graph-Based Global Reasoning Networks

Making figures on ppt

What Makes Training Multi-modal Classification Networks Hard?

G-TAD: Sub-Graph Localization for Temporal Action Detection

A flexible model for training action localization with varying levels of su...

Último

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Extrusion Processes and Their Limitations120cr0395

UNIT - IV - Air Compressors and its Performancesivaprakash250

Introduction to Multiple Access Protocol.pptxupamatechverse

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

Introduction and different types of Ethernet.pptxupamatechverse

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Porous Ceramics seminar and technical writingrakeshbaidya232001

DETR ECCV20

1. From Transformer to Detection: End-to-End Object Detection with Transformers Presented by Frost IVUL@KAUST

2. Content • What is DETR? • DETR pipeline • Background: Transformer • Transformer in DETR • Set-Based global loss • Main Results • Conclusion • Ideas • References

3. What is DETR? • Approaches object detection as a direct set prediction problem. • Transformer encoder-decoder architecture. • Set-based global loss: forces unique predictions via bipartite matching. • Due to this parallel nature, DETR is very fast and efficient. (in testing time)

4. DETR pipeline

5. Background: Transformer • Sequence to Sequence • Encoder-Decoder

6. Background: Transformer • Sequence to Sequence • Encoder-Decoder • Encoder: Self-Attention + FFN

7. Background: Self-Attention

8. Background: Transformer • Sequence to Sequence • Encoder-Decoder • Encoder: Self-Attention + FFN • Decoder: Self-Attention + Encoder-Decoder Attention + FFN

9. Background: Encoder-Decoder-Attention Y Y Encoder Encoder

10. Background: Transformer • Sequence to Sequence • Encoder-Decoder • Encoder: Self-Attention + FFN • Decoder: ... + E-D Att + ... • Multi-Headed Attention

11. Background: Transformer • Sequence to Sequence • Encoder-Decoder • Encoder: Self-Attention + FFN • Decoder: ... + E-D Att + ... • Multi-Headed Attention • Positional Encoding

12. Background: Transformer • Sequence to Sequence • Encoder-Decoder • Encoder: Self-Attention + FFN • Decoder: ... + E-D Att + ... • Multi-Headed Attention • Positional Encoding • That’s it!

13. Content • What is DETR? • DETR pipeline • Background: Transformer • Transformer in DETR • Set-Based global loss • Main Results • Conclusion • Ideas • References

14. Back to DETR

15. Transformer in DETR 1. positional encodings passed at every attention layer 2. queries are initially set to zero

16. Transformer in DETR 1. positional encodings passed at every attention layer 2. queries are initially set to zero

17. Set-based global loss Maximize iou

18. Set-based global loss a permutation of N elements predictions Maximize iou bipartite matching

19. Set-based global loss a permutation of N elements predictions Maximize iou

20. Content • What is DETR? • DETR pipeline • Background: Transformer • Transformer in DETR • Set-Based global loss • Main Results • Conclusion • Ideas • References

21. Main Results

22. Conclusion • A fresh design for object detection systems based on transformers and bipartite matching loss for direct set prediction. • Significantly better performance on large objects than Faster R-CNN, • Global information performed by the self-attention. • It under performs on a smaller objects compared to other object detectors of same magnitude. • It takes long training hours and is not real time. • The transformer architecture leads to significant overhead in training/inference

23. Ideas • Detection in 3D • Temporal Action Localization • Segmentation / Super-resolution • ... • ... • ... • ... • ...

24. References • DETR Paper • http://jalammar.github.io/illustrated-transformer/ • https://ai.facebook.com/blog/end-to-end-object-detection-with- transformers/ • https://medium.com/inside-machine-learning/what-is-a-transformer- d07dd1fbec04 • https://www.geeksforgeeks.org/maximum-bipartite-matching/ • https://medium.com/visionwizard/detr-b677c7016a47

Notas del editor

Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions.

DETR ECCV20

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a DETR ECCV20

Similar a DETR ECCV20 (20)

Más de Mengmeng Xu

Más de Mengmeng Xu (6)

Último

Último (20)

DETR ECCV20

Notas del editor