8. ULUSLARARASI SAĞLIK BİLİMLERİ VE YAŞAM KONGRESİ, Burdur, Turkey, 16 - 19 April 2025, pp.272, (Summary Text)
Background: The integration of Artificial Intelligence (AI) into clinical decision support
systems in dentistry has gained increasing attention in recent years. Aim: This study aimed to
evaluate the consistency and accuracy of three AI applications—ChatGPT 3.5, ChatGPT 4.0,
and Google Bard (Gemini)—in responding to questions related to vertical tooth fractures and
cracks. Materials and Methods: Based on the Position Statement of the European Society of
Endodontology (ESE) regarding longitudinal cracks and fractures of teeth, a total of 60 binary
(yes/no) questions (20 easy, 20 medium, 20 hard questions) were posed to ChatGPT 3.5,
ChatGPT 4.0, and Google Bard. Each model was asked the same questions three times daily
(morning, noon, and evening) over 10 consecutive days. In total, 5400 responses were recorded
and compared with reference (correct) answers. Statistical analyses, including Pearson’s chi-
square test, Bonferroni correction, and Cohen’s kappa coefficient, were conducted to evaluate
model accuracy and inter-platform agreement. Results: ChatGPT 4.0 yielded the highest overall
accuracy (86.1%), followed by ChatGPT 3.5 (85.6%) and Gemini (81.8%). Statistically
significant differences in accuracy were found between the models (p < 0.001). Pairwise
comparisons showed significant differences between ChatGPT 3.5 and Gemini (p = 0.002), and
between ChatGPT 4.0 and Gemini (p = 0.000), but not between ChatGPT 3.5 and ChatGPT 4.0
(p = 0.667). A significant time-of-day effect was found only for evening responses (p = 0.011).
No significant differences were observed for easy or hard questions, but moderate questions
revealed variability. Inter-model agreement was poor (κ = 0.019). Conclusion: Despite their
high individual accuracy, AI models demonstrated low consistency and standardization across
platforms in assessing vertical tooth fractures and cracks. Further refinement is necessary before
these tools can be reliably integrated into routine clinical practice.
Keywords: Artificial intelligence, ChatGPT, Google Bard Gemini, Cracked tooth, Vertical root
fracture