Comparative Evaluation of Vision Transformers and Convolutional Networks for Breast Ultrasound Image Classification
Loading...

Date
2026
Journal Title
Journal ISSN
Volume Title
Publisher
Open Exploration Publishing Inc
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Aim: Interobserver variability continues to limit the consistency of breast ultrasound interpretation. This study compares two Vision Transformer (ViT) models and two Convolutional Neural Network (CNN) models for automated three-class breast ultrasound classification, with a specific focus on the tradeoff between predictive performance and computational efficiency. Methods: Swin Transformer Base and DeiT Base were evaluated alongside InceptionV3 and MobileNetV3 Large using the public Breast Ultrasound Images (BUSI) dataset, which contains 780 images labeled as benign, malignant, and normal. A consistent on-the-fly augmentation pipeline was applied during training to promote robustness and reduce sensitivity to incidental image variations. Results: Swin Transformer Base achieved the highest test accuracy (0.9167) and F1 score (0.8981). MobileNetV3 Large reached an accuracy of 0.8583 with substantially lower computational demand. The efficiency contrast was pronounced, with Swin requiring 30.33 GFLOPs versus 0.43 GFLOPs for MobileNetV3 Large. Conclusions: On this benchmark, ViT models can yield higher classification performance, while lightweight CNNs offer a strong efficiency profile that may better match deployment-constrained settings. These results suggest that model selection should be guided by both predictive accuracy and operational feasibility within the target clinical workflow. © The Author(s) 2026.
Description
Keywords
Breast Cancer, Computer-Aided Diagnosis, Deep Learning, Ultrasound Images
Fields of Science
Citation
WoS Q
N/A
Scopus Q
Q4
Source
Exploration of Medicine
Volume
7
Issue
Start Page
End Page
Collections
PlumX Metrics
Citations
Scopus : 0
Google Scholar™

