Examination Date

Summer 7-11-2025

Degree

Thesis

Degree Program

Biostatistics

Examination Committee

Chair: Qingzhao Yu (qyu@lsuhsc.edu); Members: Xiao-Cheng Wu (XWu@lsuhsc.edu), Donald E. Mercante (DMerca@lsuhsc.edu)

Abstract

Outlier detection is a crucial component of data quality control across various fields, including engineering, biomedical sciences, and finance. Accurately identifying outliers is an essential step in data cleaning and analysis and is a prerequisite for drawing valid conclusions. Traditional methods such as the Z-Score method and IQR are primarily based on the assumption that data follow normal distributions. However, real-world data often exhibit skewness.

To tackle the limitations, we propose a new method for outlier detection. This approach helps to fill the methodological gap in identifying outliers under skewed distributions. We use robust parameters that are less sensitive to outliers to build the underlying distributions. Moreover, we propose adjusting the threshold for detecting outliers on both ends of the distribution according to the observed skewness of the data. By selecting appropriate adjustment factors based on skewness levels of the observed dataset, the method enables more accurate identification of data points that truly deviate from the main distribution pattern.

We conducted simulation studies for different levels of skewness, including light, moderate, and high skewness structures, with varying outlier severities. Results show that in light-skewed settings, our method significantly improves sensitivity while maintaining high specificity. In high skewed settings, the method demonstrates strong robustness by achieving the lowest misclassification rates among all tested methods.

In real-data validation, we applied the method to the SEER cancer registry database, focusing on the tumor size of colorectal cancer and lung cancers from 57 U.S. registries between 2016 and 2020. Specifically, we analyzed the rate of unknown tumor size across regions and years. The method successfully identified multiple outliers, confirming its practicality and stability in complex, skewed real-world datasets.

Thesis Defense Final Examination Report.pdf (98 kB)
Thesis Defense Final Examination Report

Share

COinS