Development and Validation of a Computer-Aided Multidimensional Flow Cytometry Analysis Pipeline Open Access

Pura, John Andrew (2012)

Permanent URL:


Flow cytometry (FCM) is a popular technique, in basic and clinical research, for the high-throughput characterization of cellular properties. Modern FCM instruments are now capable of measuring up to 20 characteristics of an individual cell, resulting in a rich array of multidimensional information on hundreds of thousands of cells. Data analysis, however, remains a challenging aspect of FCM research; the ability to acquire large multidimensional datasets has outpaced the ability to accurately and reproducibly detect expected and novel cell populations and efficiently test biological hypotheses.

The main objective of this work is to develop and validate a mechanistic pipeline that addresses the challenges found in high-throughput, multidimensional FCM data analysis. My pipeline development employs state-of-the art biomedical informatics software that can be integrated with clinical outcomes.

The pipeline consists of five key steps: 1) data preprocessing, 2) automated gating, 3) automated labeling, 4) feature extraction, and 5) feature selection. A robust quality assessment step was implemented at every step of the pipeline to account for potential sources of systematic error prior to entering a subsequent step in the pipeline. Validation against human expert analyses showed good detection of expected cell populations. The integration of state-of-the art informatics techniques into a single pipeline shows great potential for scalability to hundreds of thousands of events and multiple dimensions.

In particular, this pipeline can be used to assess a transplant recipient's immune repertoire over time in single patients and in a cross-sectional patient population with diverse diagnostic and demographic characteristics. The main assumption in the proposed pipeline is that the mechanisms driving complications in organ transplantation intersect in a way that is both anticipatable and measurable.

Table of Contents

Chapter 1
Introduction 1
1.1 FCM Technology and Data Analysis 2
1.1.1 Overview 2
1.1.2 Sample Preparation and Data Acquisition 3
1.1.3 Data Analysis 5
1.2 Challenges in Data Analysis 8
1.2.1 Quality Assessment 8
1.2.2 Population Gating 9
1.2.3 Population Labeling 10
1.2.4 Feature Selection 10
1.3 Proposed Solution - A Computer-Aided Analysis Pipeline 11
1.4 Organization of Thesis 11

Chapter 2
Introduction 13
2.1 Overview of Proposed Pipeline 14
2.2 Pipeline Components 17
2.2.1 Quality Assessment 17
2.2.2 Data Preprocessing 24
2.2.3 Cell Population Identification and Labeling 37
2.2.4 Feature Extraction 39
2.2.5 Feature Selection 39

Chapter 3
3.1 Validation of the Analysis Pipeline 41
3.1.1 Dataset 42
3.1.2 Methods 43
3.1.3 Statistical Analyses 45
3.1.4 Results 46
3.1.5 Discussion 51
3.2 Ongoing and Future Work 53
3.3 Biological Motivation - Organ Transplantation 53
3.3.1 Significance 53
3.3.2 Intrinsic Risk Profile 54
3.4 Role of Multidimensional FCM Analysis 55
3.5 Study Descriptions 56
3.5.1 Patient Cohort 56
3.5.2 Dataset 56
3.5.3 BK Virus Study 57
3.5.4 T-cell Repopulation Study 58

Concluding Remarks 59

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Partnering Agencies
Last modified

Primary PDF

Supplemental Files