Automatic Information Extraction From Childhood Cancer Pathology Reports

Hong Jun Yoon, Oak Ridge National Laboratory
Alina Peluso, Oak Ridge National Laboratory
Eric B. Durbin, University of Kentucky College of Medicine
Xiao Cheng Wu, LSU Health Sciences Center - New OrleansFollow
Antoinette Stroup, Rutgers Cancer Institute of New Jersey
Jennifer Doherty, Huntsman Cancer Institute
Stephen Schwartz, Fred Hutchinson Cancer Research Center
Charles Wiggins, The University of New Mexico
Linda Coyle, Information Management Services, Inc.
Lynne Penberthy, National Cancer Institute (NCI)

Document Type

Article

Publication Date

7-1-2022

Publication Title

JAMIA Open

Abstract

Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials and Methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.

Volume

Issue

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Yoon, Hong Jun; Peluso, Alina; Durbin, Eric B.; Wu, Xiao Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen; Wiggins, Charles; Coyle, Linda; and Penberthy, Lynne, "Automatic Information Extraction From Childhood Cancer Pathology Reports" (2022). School of Medicine Faculty Publications. 710.
https://digitalscholar.lsuhsc.edu/som_facpubs/710
10.1093/jamiaopen/ooac049

Download

Included in

Oncology Commons

COinS

DOI

10.1093/jamiaopen/ooac049

Automatic Information Extraction From Childhood Cancer Pathology Reports

Document Type

Publication Date

Publication Title

Abstract

Volume

Issue

Creative Commons License

Recommended Citation

Included in

DOI

Search

Browse

Author Corner

Links

School of Medicine Faculty Publications

Automatic Information Extraction From Childhood Cancer Pathology Reports

Authors

Document Type

Publication Date

Publication Title

Abstract

Volume

Issue

Creative Commons License

Recommended Citation

Included in

Share

DOI

Search

Browse

Author Corner

Links