1. From Transformer to Detection:
End-to-End Object Detection with
Transformers
Presented by Frost
IVUL@KAUST
2. Content
• What is DETR?
• DETR pipeline
• Background: Transformer
• Transformer in DETR
• Set-Based global loss
• Main Results
• Conclusion
• Ideas
• References
3. What is DETR?
• Approaches object detection as a direct set prediction problem.
• Transformer encoder-decoder architecture.
• Set-based global loss: forces unique predictions via bipartite matching.
• Due to this parallel nature, DETR is very fast and efficient. (in testing
time)
13. Content
• What is DETR?
• DETR pipeline
• Background: Transformer
• Transformer in DETR
• Set-Based global loss
• Main Results
• Conclusion
• Ideas
• References
20. Content
• What is DETR?
• DETR pipeline
• Background: Transformer
• Transformer in DETR
• Set-Based global loss
• Main Results
• Conclusion
• Ideas
• References
22. Conclusion
• A fresh design for object detection systems based on transformers
and bipartite matching loss for direct set prediction.
• Significantly better performance on large objects than Faster R-CNN,
• Global information performed by the self-attention.
• It under performs on a smaller objects compared to other object
detectors of same magnitude.
• It takes long training hours and is not real time.
• The transformer architecture leads to significant overhead in
training/inference
Given a fixed small set of learned object queries,
DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions.