Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education

Document Type

Article

Publication Date

4-29-2026

Publication Title

Journal of Dentistry

Abstract

OBJECTIVES: Studies have investigated the potential applications of Large Language Models (LLMs) in dental education. As LLMs evolve, it is necessary to determine the extent to which their accuracy improves before they can be integrated into dental curricula and clinical practice. This meta-analysis investigates the performance of LLM versions on text-based single-best-answer multiple-choice questions (MCQs) in dentistry over time and assesses the impact of question type and source on LLM accuracy. MATERIALS AND METHODS: Typical guidelines for reporting systematic reviews and meta-analyses were followed. Study eligibility followed the PICO (Population, Intervention, Comparison, Outcomes) model. Included records were published between January 2022 and August 2025. Screenings and data extractions were performed by two reviewers, with disagreements resolved by a third reviewer. Proportional meta-analysis was used to determine the pooled accuracy of LLM versions across studies. Meta-regression explored sources of heterogeneity. RESULTS: Twenty-seven articles met the inclusion criteria. Over time, LLM accuracy in answering dentistry MCQs has significantly improved (p < 0.001). Heterogeneity across all LLMs was high ( ≥ 90%). OpenAI's models (ChatGPT-3.5, 4, and 4o or newer) significantly outperformed competitors (p < 0.001). Question type (stem-only vs. clinical vignette + stem) and question source (locally developed, for-profit commercial product, or free resources) did not significantly affect accuracy. CONCLUSIONS: As LLMs evolve, their accuracy on MCQs is improving, increasing their potential as supplemental tools in dental education.

First Page

106724

PubMed ID

42066887

Volume

171

Rights

Recommended Citation

Doubleday, Alison F.; Cheverko, Colleen M.; Bolgova, Olena; Mavrych, Volodymyr; Mohamed, Fathima Raahima; Westrick, Jennifer; Juarez, Lorena; Rush, Emily; Solka, Kathryn A.; Byram, Jessica N.; Beacker, Robert; Gomez, Victoria; Ganeng, Brenda K.; Hoffman, Leslie A.; Roach, Victoria A.; Brown, Kirsten M.; DeVaul, Nicole; Garnett, Colleen N.; Herriott, Hannah L.; Lufler, Rebecca S.; Mussell, Jason C.; Balta, Joy Y.; Pascoe, Michael A.; Middleton, Jason W.; Duffy, Sharon; Stephens, Georgina C.; and Wilson, Adam B., "Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education" (2026). School of Graduate Studies Faculty Publications. 594.
https://digitalscholar.lsuhsc.edu/sogs_facpubs/594
10.1016/j.jdent.2026.106724

Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education

Document Type

Publication Date

Publication Title

Abstract

First Page

PubMed ID

Volume

Rights

Recommended Citation

DOI

Search

Browse

Author Corner

Links

School of Graduate Studies Faculty Publications

Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education

Authors

Document Type

Publication Date

Publication Title

Abstract

First Page

PubMed ID

Volume

Rights

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links