Evaluating and Comparing Student Responses in Examinations from the Perspectives of Human and Artificial Intelligence (GPT-4 and Gemini)

Domanic, Kubra Yildiz; Baycan, Sukran

Evaluating and Comparing Student Responses in Examinations from the Perspectives of Human and Artificial Intelligence (GPT-4 and Gemini)

dc.contributor.author	Domanic, Kubra Yildiz
dc.contributor.author	Baycan, Sukran
dc.date.accessioned	2025-11-10T17:13:53Z
dc.date.available	2025-11-10T17:13:53Z
dc.date.issued	2025
dc.description.abstract	BackgroundGenerative Artificial Intelligence (AI) models, such as ChatGPT (GPT-4) and Gemini, offer potential benefits in educational settings, including dental education. These tools have shown promise in enhancing learning and assessment processes, particularly in dental prosthetic technology (DPT) and oral health (OH) programs.ObjectiveThis study aimed to evaluate the accuracy, reliability, and consistency of GPT-4 and Gemini AI models in answering examination questions in dental education. The study focused on multiple-choice questions (MCQs), true/false (T/F) questions, and short-answer questions (SAQs).MethodsAn exploratory study design was used with 30 questions (10 MCQs, 10 T/F, and 10 SAQs) covering key topics in DPT and OH education. ChatGPT and Gemini were tested with the same set of questions on two separate occasions to assess consistency. Responses were evaluated by two independent researchers using a predefined answer key. Data were analyzed using descriptive statistics, the Kappa coefficient for agreement, and the Chi-square test for categorical variables.ResultsChatGPT demonstrated high accuracy in MCQs (90%) and T/F questions (85%) but showed reduced performance in SAQs (60%). Gemini's accuracy ranged between 60% and 70%, with the highest accuracy in SAQs (70%). ChatGPT showed significant consistency across testing dates (Kappa = 0.754; p = 0.001), whereas Gemini's responses were less consistent (Kappa = 0.634; p = 0.001).ConclusionWhile both AI models offer valuable support in dental education, ChatGPT exhibited greater accuracy and consistency in structured assessments. The findings suggest that AI tools can enhance teaching and assessment methods if integrated thoughtfully, supporting personalized learning while maintaining academic integrity.	en_US
dc.identifier.doi	10.1186/s12909-025-07835-y
dc.identifier.issn	1472-6920
dc.identifier.scopus	2-s2.0-105017572176
dc.identifier.uri	https://doi.org/10.1186/s12909-025-07835-y
dc.identifier.uri	https://hdl.handle.net/20.500.14627/1301
dc.language.iso	en	en_US
dc.publisher	BMC	en_US
dc.relation.ispartof	BMC Medical Education	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	ChatGPT	en_US
dc.subject	GPT-4	en_US
dc.subject	Gemini	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Dental Education	en_US
dc.subject	Assessment Methods	en_US
dc.subject	Dental Prosthetics	en_US
dc.subject	Oral Health	en_US
dc.title	Evaluating and Comparing Student Responses in Examinations from the Perspectives of Human and Artificial Intelligence (GPT-4 and Gemini)	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
gdc.author.scopusid	57217250690
gdc.author.scopusid	60125736300
gdc.coar.access	open access
gdc.coar.type	text::journal::journal article
gdc.description.department	Fenerbahçe University	en_US
gdc.description.departmenttemp	[Domanic, Kubra Yildiz] Atlas Univ, Istanbul, Turkiye; [Baycan, Sukran] Fenerbahce Univ, Istanbul, Turkiye	en_US
gdc.description.issue	1	en_US
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	Q1
gdc.description.volume	25	en_US
gdc.description.woscitationindex	Science Citation Index Expanded - Social Science Citation Index
gdc.description.wosquality	Q1
gdc.identifier.openalex	W4414752392
gdc.identifier.pmid	41039436
gdc.identifier.wos	WOS:001586830100037
gdc.index.type	WoS
gdc.index.type	Scopus
gdc.index.type	PubMed
gdc.openalex.fwci	0.0
gdc.openalex.normalizedpercentile	0.21
gdc.plumx.mendeley	30
gdc.plumx.scopuscites	0
gdc.scopus.citedcount	0
gdc.wos.citedcount	0

Collections

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Evaluating and Comparing Student Responses in Examinations from the Perspectives of Human and Artificial Intelligence (GPT-4 and Gemini)

Files

Collections