Computational discovery of interpretable histopathologic prognostic biomarkers in invasive carcinomas of the breast Open Access

Tageldin, Mohamed (Fall 2021)

Permanent URL:


While microscopic examination of tumor resections and biopsies has been a cornerstone in breast cancer grading for decades, it suffers from considerable inter-rater variability due to perceptual limitations and high clinical caseloads. Computational analysis of whole-slide image scans using convolutional neural networks (CNN) can help address this challenge. Unfortunately, CNNs can be difficult to interpret, which motivates our adoption of an approach called concept bottlenecking, where models first detect various tissue structures then use them to make their prediction. Concept bottleneck models require a large set of manual annotation data to train. Unfortunately, manual delineation of histopathologic structures is very demanding and impractical given pathologists’ time constraints. This dissertation describes contributions that fall under the themes of scalable data collection, deep learning-based tissue detection, and the discovery of novel histopathologic biomarkers and associations.

First, we examine crowdsourcing approaches that engage medical students to collect manual annotation data. Our results show that a structured, collaborative approach with pathologist supervision is scalable; the resultant publicly-released BCSS and NuCLS datasets contain 20,000 and 200,000 annotations of tissue regions and nuclei, respectively. We show that medical students produce accurate annotations for predominant, visually distinctive structures and that algorithmic suggestions help scale and improve the accuracy of annotations.

Second, we describe a set of CNN modeling approaches for the accurate delineation of histopathologic structures. We describe various improvements to enhance the performance of nucleus detection CNN models and introduce a technique called Decision Tree Approximation of Learned Embeddings, which helps explain CNN nucleus classifications without compromising prediction accuracy. Additionally, we offer consensus recommendations from the International Immuno-Oncology Working Group surrounding the computational detection of tumor-infiltrating lymphocytes, a critical emerging biomarker. Following these recommendations, we develop and validate a multi-scale CNN model that jointly detects tissue regions and nuclei, employing pre-defined biological constraints to improve accuracy.

Finally, we describe the development of a morphologic signature based on quantitative features extracted from computationally-delineated histopathologic regions and cells. This morphologic signature relies partly on a set of stromal features not captured by clinical guidelines for breast cancer grading, and has a stronger prognostic value.

Table of Contents

1 Background and significance . . . 1

1.1 Histopathology and Cancer Biology . . . 2

1.2 Computational Pathology . . . 8

1.3 Machine learning in pathology 17

1.4 Integrative computational analysis . . . 24

1.5 Organization and summary of contributions . . . 29

1.6 List of publications . . . 36

2 Crowdsourcing strategies for scalable curation of annotation datasets . . . 40

2.1 Structured crowdsourcing enables convolutional segmentation of histology images . . . 41

2.2 NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer . . . 49

2.3 A pathologist-annotated dataset for validating artificial intelligence: a project description and pilot study . . . 60

3 Deep-learning methods for automatic detection of histopathology structures . . . 73

3.1 Report on computational assessment of tumor infiltrating lymphocytes from the International Immuno-Oncology Biomarker Working Group . . . 75

3.2 Explainable nucleus classification using Decision Tree Approximation of Learned Embeddings . . . 89

3.3 Joint region and nucleus segmentation for characterization of tumor infiltrating lymphocytes in breast cancer . . . 97

3.4 MuTILs: explainable, multiresolution computational scoring of Tumor-Infiltrating Lymphocytes in breast carcinomas using clinical guidelines . . . 106

4 Histopathologic correlates of clinical and genomic phenotypes . . . 117

4.1 Histomic Prognostic Score: a computational morphologic signature with independent prognostic value in invasive carcinomas of the breast . . . 118

4.2 High expression of MKK3 is associated with worse clinical outcomes in African American breast cancer patients . . . 144

5 HistomicsTK: an open-source software for computational pathology . . . 164

5.1 Creation, parsing and expert review of WSI annotations . . . 165

5.2 Image processing operations for computational pathology . . . 170

5.3 Simple workows for detection of salient tissue . . . 171

6 Summary of conclusions and future directions . . . 176

7 List of recurrent abbreviations . . . 182

Appendix A : Supplementary methods and results . . . 182

Supplement for Section 2.1 . . . 183

Supplement for Section 2.2 . . . 201

Supplement for Section 3.2 . . . 217

Bibliography . . . 225

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files