Development of a Human Immunodeficiency Virus Risk Prediction Model Using Electronic Health Record Data From an Academic Health System in the Southern United States
Document Type
Article
Publication Date
9-19-2022
Publication Title
Clinical infectious diseases : an official publication of the Infectious Diseases Society of America
Abstract
BACKGROUND: Human immunodeficiency virus (HIV) pre-exposure prophylaxis (PrEP) is underutilized in the southern United States. Rapid identification of individuals vulnerable to diagnosis of HIV using electronic health record (EHR)-based tools may augment PrEP uptake in the region. METHODS: Using machine learning, we developed EHR-based models to predict incident HIV diagnosis as a surrogate for PrEP candidacy. We included patients from a southern medical system with encounters between October 2014 and August 2016, training the model to predict incident HIV diagnosis between September 2016 and August 2018. We obtained 74 EHR variables as potential predictors. We compared Extreme Gradient Boosting (XGBoost) versus least absolute shrinkage selection operator (LASSO) logistic regression models, and assessed performance, overall and among women, using area under the receiver operating characteristic curve (AUROC) and area under precision recall curve (AUPRC). RESULTS: Of 998 787 eligible patients, 162 had an incident HIV diagnosis, of whom 49 were women. The XGBoost model outperformed the LASSO model for the total cohort, achieving an AUROC of 0.89 and AUPRC of 0.01. The female-only cohort XGBoost model resulted in an AUROC of 0.78 and AUPRC of 0.00025. The most predictive variables for the overall cohort were race, sex, and male partner. The strongest positive predictors for the female-only cohort were history of pelvic inflammatory disease, drug use, and tobacco use. CONCLUSIONS: Our machine-learning models were able to effectively predict incident HIV diagnoses including among women. This study establishes feasibility of using these models to identify persons most suitable for PrEP in the South.
First Page
299
Last Page
306
PubMed ID
36125084
Volume
76
Issue
2
Recommended Citation
Burns, Charles M.; Pung, Leland; Witt, Daniel; Gao, Michael; Sendak, Mark; Balu, Suresh; Krakower, Douglas; Marcus, Julia L.; Okeke, Nwora Lance; and Clement, Meredith E., "Development of a Human Immunodeficiency Virus Risk Prediction Model Using Electronic Health Record Data From an Academic Health System in the Southern United States" (2022). School of Medicine Faculty Publications. 507.
https://digitalscholar.lsuhsc.edu/som_facpubs/507
10.1093/cid/ciac775