Naral S.Cakmak Y.Pacal I.2026-03-122026-03-1220262692-310610.37349/emed.2026.10013822-s2.0-105031616628https://doi.org/10.37349/emed.2026.1001382https://hdl.handle.net/20.500.14627/1463Aim: Interobserver variability continues to limit the consistency of breast ultrasound interpretation. This study compares two Vision Transformer (ViT) models and two Convolutional Neural Network (CNN) models for automated three-class breast ultrasound classification, with a specific focus on the tradeoff between predictive performance and computational efficiency. Methods: Swin Transformer Base and DeiT Base were evaluated alongside InceptionV3 and MobileNetV3 Large using the public Breast Ultrasound Images (BUSI) dataset, which contains 780 images labeled as benign, malignant, and normal. A consistent on-the-fly augmentation pipeline was applied during training to promote robustness and reduce sensitivity to incidental image variations. Results: Swin Transformer Base achieved the highest test accuracy (0.9167) and F1 score (0.8981). MobileNetV3 Large reached an accuracy of 0.8583 with substantially lower computational demand. The efficiency contrast was pronounced, with Swin requiring 30.33 GFLOPs versus 0.43 GFLOPs for MobileNetV3 Large. Conclusions: On this benchmark, ViT models can yield higher classification performance, while lightweight CNNs offer a strong efficiency profile that may better match deployment-constrained settings. These results suggest that model selection should be guided by both predictive accuracy and operational feasibility within the target clinical workflow. © The Author(s) 2026.eninfo:eu-repo/semantics/openAccessBreast CancerComputer-Aided DiagnosisDeep LearningUltrasound ImagesComparative Evaluation of Vision Transformers and Convolutional Networks for Breast Ultrasound Image ClassificationArticle