A Data-Efficient Machine Learning Approach for Breast Ultrasound Lesion Classification Integrating Image-Derived Features and Sonographic Descriptors

dc.contributor.author Karacor, Adil Gursel
dc.contributor.author Sahin, Sevim
dc.date.accessioned 2026-05-12T14:56:18Z
dc.date.available 2026-05-12T14:56:18Z
dc.date.issued 2026
dc.description.abstract Background/Objectives: Breast ultrasound is widely used for the diagnostic evaluation of breast lesions; however, reliable lesion characterization remains challenging due to substantial image heterogeneity and the limited size of most clinically available datasets. These constraints reduce the generalizability of end-to-end deep learning approaches in routine practice. The objective of this study was to evaluate a data-efficient diagnostic framework that integrates image-derived features with clinical sonographic descriptors to improve breast ultrasound lesion classification in small cohorts. Methods: Ultrasound images from the publicly available BrEaST-Lesions dataset were processed using a pretrained convolutional neural network to extract compact image feature representations from full images, lesion masks, and cropped tumor regions. These features were combined with manually recorded sonographic descriptors after label encoding to form a unified tabular dataset. Gradient-boosted tree models were trained using descriptor-only and fused feature sets with fivefold stratified cross-validation and evaluated on an independent external hold-out test set. Results: Using sonographic descriptors alone, the best-performing model (LightGBM) achieved an external validation accuracy of 0.88, with an area under the receiver operating characteristic curve (AUC) of 0.95. Incorporation of image-derived features improved diagnostic performance on the external test set, yielding an accuracy of 0.88, an AUC of 0.96, and a sensitivity of 1.00 for malignant lesion detection. The fused framework demonstrated more stable generalization than descriptor-only models, particularly for malignant cases. Conclusions: Combining image-derived features with clinical sonographic descriptors within a tabular learning framework provides a robust and data-efficient approach for breast ultrasound-based lesion classification. This strategy supports diagnostic decision-making in small ultrasound datasets and represents a clinically realistic alternative when large-scale deep learning models are impractical.
dc.identifier.doi 10.3390/diagnostics16050664
dc.identifier.issn 2075-4418
dc.identifier.scopus 2-s2.0-105032560379
dc.identifier.uri https://hdl.handle.net/123456789/1483
dc.identifier.uri https://doi.org/10.3390/diagnostics16050664
dc.language.iso en
dc.publisher MDPI
dc.relation.ispartof Diagnostics
dc.rights info:eu-repo/semantics/openAccess
dc.subject Breast Ultrasound
dc.subject Feature Fusion
dc.subject Sonographic Descriptors
dc.subject Lesion Classification
dc.subject Small Datasets
dc.subject Diagnostic Decision Support
dc.title A Data-Efficient Machine Learning Approach for Breast Ultrasound Lesion Classification Integrating Image-Derived Features and Sonographic Descriptors en_US
dc.type Article
dspace.entity.type Publication
gdc.author.scopusid 16417519900
gdc.author.scopusid 60173985900
gdc.description.department
gdc.description.departmenttemp [Karacor, Adil Gursel] Fenerbahce Univ, Fac Engn & Nat Sci, Dept Ind Engn, TR-34758 Istanbul, Turkiye; [Sahin, Sevim] Fenerbahce Univ, Fac Engn & Nat Sci, Dept Elect & Elect Engn, TR-34758 Istanbul, Turkiye
gdc.description.issue 5
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.volume 16
gdc.description.woscitationindex Science Citation Index Expanded
gdc.identifier.pmid 41827939
gdc.identifier.wos WOS:001713918500001
gdc.index.type PubMed
gdc.index.type Scopus
gdc.index.type WoS
gdc.virtual.author Şahin, Sevim
gdc.virtual.author Karaçor, Adil Gürsel
relation.isAuthorOfPublication 137b9c99-3632-425b-a3e8-9dace6596145
relation.isAuthorOfPublication 1dca77e3-d77c-4f1c-b940-947f84ac7f05
relation.isAuthorOfPublication.latestForDiscovery 137b9c99-3632-425b-a3e8-9dace6596145

Files