Evaluation of Cutting-Edge Object Detection Architectures on Multi-Object and Single-Object Datasets

Parlak, Cevahir

doi:10.34248/bsengineering.1736319

Evaluation of Cutting-Edge Object Detection Architectures on Multi-Object and Single-Object Datasets

dc.contributor.author	Parlak, Cevahir
dc.date.accessioned	2026-05-12T15:03:58Z
dc.date.available	2026-05-12T15:03:58Z
dc.date.issued	2026
dc.description.abstract	This study focuses on the performance evaluation of cutting-edge object detection models, namely, YOLO12X, Mask R-CNN, RT-DETR-X, and RF-DETR-Large on the Open Images (Multi-Object) and LaSOT (Single-Object) datasets. Current cutting-edge trend applications involve CNN-based and Transformer-based object detection models. CNN-based models can use one-pass (YOLO family) or two-pass (R-CNN family) implementations. One-pass object detection models can be faster but suffer from accuracy compared to the two-pass models. Transformer-based models can use Detection Transformers or Vision Transformers. Transformer-based models are gaining popularity, and their performance surpasses CNN-based models. This study evaluates YOLO12X, Mask R-CNN from CNN-based family, and RT-DETR-X, RF-DETR-Large transformer-based architectures in terms of accuracy and time on the Open Images and the LaSOT datasets. All models are the largest available models and pretrained on COCO dataset. Transformer-based models incorporate special types of self-attention and pose significant improvement both on accuracy and speed. The experimental results demonstrate that attention and transformer-based models perform better than the traditional CNN-based object detectors and YOLO12X is the fastest method with a far margin. On the LaSOT dataset, RT-DETR-X posts 0.8804 IoU, 0.7047 F1-score, 0.6597 mAP@0.5, 28.64 fps whereas YOLO12X achieves 0.8572 IoU, 0.6657 F1-score, 0.5357 mAP@0.5, and 49.78 fps.	en_US
dc.identifier.doi	10.34248/bsengineering.1736319
dc.identifier.issn	2619-8991
dc.identifier.uri	https://hdl.handle.net/123456789/1508
dc.identifier.uri	https://search.trdizin.gov.tr/en/yayin/detay/1382587
dc.language.iso	en
dc.relation.ispartof	Black Sea Journal of Engineering and Science
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Bilgisayar Bilimleri, Yazılım Mühendisliği
dc.subject	Bilgisayar Bilimleri, Yapay Zeka
dc.title	Evaluation of Cutting-Edge Object Detection Architectures on Multi-Object and Single-Object Datasets	en_US
dc.type	Article
dspace.entity.type	Publication
gdc.author.id	0000-0002-5500-7379
gdc.author.institutional	Parlak, Cevahir
gdc.coar.access	open access
gdc.coar.type	text::journal::journal article
gdc.description.department
gdc.description.departmenttemp	[(phd), Cevahir Parlak] Fenerbahçe Üniversitesi, Mühendislik Ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü, İstanbul, Türkiye
gdc.description.endpage	294
gdc.description.issue	1
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.startpage	287
gdc.description.volume	9
gdc.identifier.trdizinid	1382587
gdc.index.type	TR-Dizin
relation.isAuthorOfPublication.latestForDiscovery	d57697e4-e3d0-4c57-bb5c-4b92ceb92d1a
relation.isOrgUnitOfPublication.latestForDiscovery	ca7e1f00-cfa9-4a7f-928b-78cbb9b7575e

Files

Original bundle

Now showing 1 - 1 of 1

Name:: document (27).pdf
Size:: 888.38 KB
Format:: Adobe Portable Document Format

Download

Collections

TR-Dizin İndeksli Yayınlar Koleksiyonu