logo

RT-DETR, A Transformer Model, Outperforms Yolov8 in Accuracy and Performance

Recent Post:
contentImage


The Real-Time Detection Transformer (RT-DETR), developed by Baidu, is an advanced end-to-end object detector offering real-time performance with high accuracy. It utilizes Vision Transformers (ViT) to efficiently handle multiscale features by separating intra-scale interaction from cross-scale fusion. RT-DETR is highly flexible, allowing for the adjustment of inference speed through different decoder layers without the need for retraining. The model performs exceptionally well on accelerated platforms like CUDA with TensorRT, surpassing many other real-time object detectors in performance. 

 

In a recent research paper by Baidu Inc. titled "DETRs Beat YOLOs on Real-Time Object Detection," the negative impact of non-maximum suppression (NMS) on real-time detectors was analyzed, and an efficient hybrid encoder for multi-scale feature processing was proposed. The IoU-aware query selection enhances performance. RT-DETR-L achieves 53.0% AP on COCO val2017 at 114 FPS, outperforming YOLO detectors. RT-DETR-X achieves 54.8% AP at 74 FPS, surpassing YOLO in both speed and accuracy. RT-DETR-R50 achieves 53.1% AP at 108 FPS, outperforming DINO-DeformableDETR-R50 by 2.2% AP in accuracy and 21 times in FPS.