Examination Date
5-8-2024
Degree
Dissertation
Degree Program
Biostatistics
Examination Committee
Hui-Yi Lin, PhD; Zhide Fang, PhD; Evrim Oral, PhD; Andrew G. Chapple, PhD; Jian Li, PhD
Abstract
Understanding the intricate relationship between an individual’s genetic variants (genotypes) and their observable traits (phenotypes) is imperative for comprehending the underlying mechanisms of complex diseases and devising effective therapeutic interventions. This relationship is notably influenced by genetic variants and gene-gene interactions. Despite Genome-Wide Association Studies (GWAS) identifying numerous disease-associated Single Nucleotide Polymorphisms (SNPs), their individual effects on disease susceptibility are often modest. Polygenic Risk Scores (PRS) have been proposed to address this limitation by amalgamating information from multiple SNPs. However, incorporating SNP-SNP interactions in PRS construction remains underexplored, despite its acknowledged importance in complex diseases. Incorporating SNP-SNP interactions into PRS construction is pivotal for enhancing prediction accuracy.
To tackle these challenges, this study introduces nine Polygenic Risk Scores incorporating SNP Interaction (PRS-int): RR, RP, RRp4, M9, AAF, SRA, WSRA, CA, and LSM9. Utilizing a novel technique that considers nine pair-genotype interactions, each PRS-int score reflects the estimated risk associated with a phenotype. Simulation evaluations across diverse scenarios illustrate the superior risk classification performance of PRS-int. Among these scores, RR, RRp4, and M9 generally exhibit better performance. However, the meticulous selection of SNP pairs is essential for improving PRS prediction accuracy by minimizing noise and variability in the model.
In real-world applications, PRS-int scores developed using the PRACTICAL UK dataset to predict prostate cancer aggressiveness were validated on the independent US and EU datasets. Among the nine PRS-int scores, RR, RRp4, and M9 consistently demonstrated high performance. The RR score performs well, with p-values of 6.72E-17, 7.83E-7, and 4.94E-13 for UK, US, and EU datasets, respectively. Validation assessments consistently indicate that PRS-int outperforms two published PRS models based solely on SNP main effects, with p-values of 7.45E-4, 4.39E-4, and 4.76E-8, and 8.23E-7, 3.26E-5, and 9.36E-11, respectively, for UK, US, and EU datasets.
This comprehensive study thoroughly compares nine PRS-int scores, suggesting that RR and RRp4 perform the best. Furthermore, PRS-int outperforms published PRS models with SNP main effects only. Consequently, the proposed RR and RRp4 offer a promising avenue for integrating SNP-SNP interactions into polygenic risk scores, enhancing predictive performance and advancing understanding of complex disease mechanisms.
Recommended Citation
Sarkar, Indrani, "Developing SNP Interaction Polygenic Risk Scores (PRS-int Scores)" (2024). School of Public Health. 9.
https://digitalscholar.lsuhsc.edu/etd_sph/9
Dissertation Report (I. Sarkar) April 2024