School of Public Health Faculty Publications

Privacy-Preserving Deep Learning NLP Models for Cancer Registries

Mohammed Alawad, Oak Ridge National LaboratoryFollow
Hong-Jun Yoon, Oak Ridge National Laboratory
Shang Gao, Oak Ridge National Laboratory
Brent Mumphrey, LSU Health Sciences Center - New Orleans
Xiao-Cheng Wu, LSU Health Sciences Center - New Orleans
Eric B. Durbin, University of Kentucky
Jong C. Jeong, University of Kentucky
Isaac Hands, University of Kentucky
David Rust, University of Kentucky
Linda Coyle, Information Management Services Inc.
Lynne Penberthy, National Cancer Institute
Georgia Tourassi, Oak Ridge National Laboratory

Document Type

Article

Publication Date

4-16-2020

Publication Title

IEEE Transactions on Emerging Topics in Computing

Abstract

Population cancer registries can benefit from Deep Learning (DL) to automatically extract cancer characteristics from the high volume of unstructured pathology text reports they process annually. The success of DL to tackle this and other real-world problems is proportional to the availability of large labeled datasets for model training. Although collaboration among cancer registries is essential to fully exploit the promise of DL, privacy and confidentiality concerns are main obstacles for data sharing across cancer registries. Moreover, DL for natural language processing (NLP) requires sharing a vocabulary dictionary for the embedding layer which may contain patient identifiers. Thus, even distributing the trained models across cancer registries causes a privacy violation issue. In this article, we propose DL NLP model distribution via privacy-preserving transfer learning approaches without sharing sensitive data. These approaches are used to distribute a multitask convolutional neural network (MT-CNN) NLP model among cancer registries. The model is trained to extract six key cancer characteristics - tumor site, subsite, laterality, behavior, histology, and grade - from cancer pathology reports. Using 410,064 pathology documents from two cancer registries, we compare our proposed approach to conventional transfer learning without privacy-preserving, single-registry models, and a model trained on centrally hosted data. The results show that transfer learning approaches including data sharing and model distribution outperform significantly the single-registry model. In addition, the best performing privacy-preserving model distribution approach achieves statistically indistinguishable average micro- and macro-F1 scores across all extraction tasks (0.823,0.580) as compared to the centralized model (0.827,0.585).

First Page

1219

Last Page

1230

Volume

Issue

Publisher

Institute of Electrical and Electronics Engineers [Society Publisher]

Recommended Citation

Alawad, Mohammed; Yoon, Hong-Jun; Gao, Shang; Mumphrey, Brent; Wu, Xiao-Cheng; Durbin, Eric B.; Jeong, Jong C.; Hands, Isaac; Rust, David; Coyle, Linda; Penberthy, Lynne; and Tourassi, Georgia, "Privacy-Preserving Deep Learning NLP Models for Cancer Registries" (2020). School of Public Health Faculty Publications. 61.
https://digitalscholar.lsuhsc.edu/soph_facpubs/61
10.1109/TETC.2020.2983404

Link to Full Text

Find in your library

COinS

DOI

10.1109/TETC.2020.2983404

Privacy-Preserving Deep Learning NLP Models for Cancer Registries

Document Type

Publication Date

Publication Title

Abstract

First Page

Last Page

Volume

Issue

Publisher

Recommended Citation

DOI

Search

Browse

Author Corner

Links

School of Public Health Faculty Publications

Privacy-Preserving Deep Learning NLP Models for Cancer Registries

Authors

Document Type

Publication Date

Publication Title

Abstract

First Page

Last Page

Volume

Issue

Publisher

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links