Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education
Document Type
Article
Publication Date
4-29-2026
Publication Title
Journal of Dentistry
Abstract
OBJECTIVES: Studies have investigated the potential applications of Large Language Models (LLMs) in dental education. As LLMs evolve, it is necessary to determine the extent to which their accuracy improves before they can be integrated into dental curricula and clinical practice. This meta-analysis investigates the performance of LLM versions on text-based single-best-answer multiple-choice questions (MCQs) in dentistry over time and assesses the impact of question type and source on LLM accuracy. MATERIALS AND METHODS: Typical guidelines for reporting systematic reviews and meta-analyses were followed. Study eligibility followed the PICO (Population, Intervention, Comparison, Outcomes) model. Included records were published between January 2022 and August 2025. Screenings and data extractions were performed by two reviewers, with disagreements resolved by a third reviewer. Proportional meta-analysis was used to determine the pooled accuracy of LLM versions across studies. Meta-regression explored sources of heterogeneity. RESULTS: Twenty-seven articles met the inclusion criteria. Over time, LLM accuracy in answering dentistry MCQs has significantly improved (p < 0.001). Heterogeneity across all LLMs was high ( ≥ 90%). OpenAI's models (ChatGPT-3.5, 4, and 4o or newer) significantly outperformed competitors (p < 0.001). Question type (stem-only vs. clinical vignette + stem) and question source (locally developed, for-profit commercial product, or free resources) did not significantly affect accuracy. CONCLUSIONS: As LLMs evolve, their accuracy on MCQs is improving, increasing their potential as supplemental tools in dental education.
First Page
106724
PubMed ID
42066887
Volume
171
Rights
© 2026 Elsevier Ltd.
Recommended Citation
Doubleday, Alison F.; Cheverko, Colleen M.; Bolgova, Olena; Mavrych, Volodymyr; Mohamed, Fathima Raahima; Westrick, Jennifer; Juarez, Lorena; Rush, Emily; Solka, Kathryn A.; Byram, Jessica N.; Beacker, Robert; Gomez, Victoria; Ganeng, Brenda K.; Hoffman, Leslie A.; Roach, Victoria A.; Brown, Kirsten M.; DeVaul, Nicole; Garnett, Colleen N.; Herriott, Hannah L.; Lufler, Rebecca S.; Mussell, Jason C.; Balta, Joy Y.; Pascoe, Michael A.; Middleton, Jason W.; Duffy, Sharon; Stephens, Georgina C.; and Wilson, Adam B., "Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education" (2026). School of Graduate Studies Faculty Publications. 594.
https://digitalscholar.lsuhsc.edu/sogs_facpubs/594
10.1016/j.jdent.2026.106724