Evaluation of Cutting-Edge Object Detection Architectures on Multi-Object and Single-Object Datasets

Loading...
Publication Logo

Date

2026

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

This study focuses on the performance evaluation of cutting-edge object detection models, namely, YOLO12X, Mask R-CNN, RT-DETR-X, and RF-DETR-Large on the Open Images (Multi-Object) and LaSOT (Single-Object) datasets. Current cutting-edge trend applications involve CNN-based and Transformer-based object detection models. CNN-based models can use one-pass (YOLO family) or two-pass (R-CNN family) implementations. One-pass object detection models can be faster but suffer from accuracy compared to the two-pass models. Transformer-based models can use Detection Transformers or Vision Transformers. Transformer-based models are gaining popularity, and their performance surpasses CNN-based models. This study evaluates YOLO12X, Mask R-CNN from CNN-based family, and RT-DETR-X, RF-DETR-Large transformer-based architectures in terms of accuracy and time on the Open Images and the LaSOT datasets. All models are the largest available models and pretrained on COCO dataset. Transformer-based models incorporate special types of self-attention and pose significant improvement both on accuracy and speed. The experimental results demonstrate that attention and transformer-based models perform better than the traditional CNN-based object detectors and YOLO12X is the fastest method with a far margin. On the LaSOT dataset, RT-DETR-X posts 0.8804 IoU, 0.7047 F1-score, 0.6597 mAP@0.5, 28.64 fps whereas YOLO12X achieves 0.8572 IoU, 0.6657 F1-score, 0.5357 mAP@0.5, and 49.78 fps.

Description

Keywords

Bilgisayar Bilimleri, Yazılım Mühendisliği, Bilgisayar Bilimleri, Yapay Zeka

Fields of Science

Citation

WoS Q

Scopus Q

Source

Black Sea Journal of Engineering and Science

Volume

9

Issue

1

Start Page

287

End Page

294
Google Scholar Logo
Google Scholar™

Sustainable Development Goals

SDG data could not be loaded because of an error. Please refresh the page or try again later.