Deciphering the Cell Type Specific Activities from High-throughput Omics Data Restricted; Files Only

Chen, Luxiao (Spring 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/bc386k711?locale=it
Published

Abstract

There are hundreds of cell types in the human body carrying different functions. Understanding the cell type specific (CTS) activities will greatly enhance our knowledge on the biological and clinical mechanisms. The advancements in bulk and single cell high-throughput omics technologies enable us to study the CTS effects from the genomics perspective.

Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate CTS inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we developed a novel statistical method named CeDAR to incorporate the cell type hierarchy in CTS differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting CTS differential signals compared with existing methods, especially in low-abundance cell types.

Single cell RNA-seq (scRNA-seq) allows scientists to study gene expression profile of individual cells in one sample. The increasing interest to apply this technique at population level has facilitated appearing of many datasets containing multiple subjects measured by scRNA-seq. In the real scRNA-seq data, we observed that CTS genes may not consistently appear across all subjects, while they are expected to appear consistently. Motivated by this observation, we first designed a statistical model to identify CTS genes that consistently appear in population-level scRNA-seq data. We then designed a strategy to incorporate these consistent CTS genes identified from historical data into analyses like cell-typing. Data analyses demonstrate that the proposed method and strategy can well identify consistent CTS genes and improve downstream analysis performance.

In scRNA-seq data, cells from extremely low-abundance cell types are called rare cell population (RCP), which plays great roles in biological activities. Because its low abundance, traditional clustering methods can hardly identify it. To correctly identify RCPs in scRNA-seq data, methods with different focuses have been developed. This provides great opportunity for RCP studies; meanwhile, it also makes users difficult to choose. Thus, we summarized these methods and benchmarked them with simulated data to provide comprehensive evaluation with different metrics. 

Table of Contents

1 Introduction 1

1.1 Cell type specificity in biological activities . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Cell type specificity analysis with omics data. . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Brief introduction to some types of omics data . . . . . . . . . . . . . . . . . 3

1.2.2 Cell type specificity analysis with bulk omics data . . . . . . . . . . . . . . 5

1.2.3 Cell type specificity analysis with scRNA-seq data . . . . . . . . . . . . . . 7

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Incorporating cell type hierarchy improves cell type-specific differ-

ential analyses in bulk omics data 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Methods overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 The CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.5 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Strong correlations of DE/DM states among cell types are ob-

served in real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.3 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 36

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 Investigating the cell type specific genes from population-level single-

cell RNA-seq 45

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 Subject-level summary statistics representing cell type speci-

ficity of genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.2 A hierarchical model for CTS genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.3 Identification of CTS genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.4 Parameters estimation with EM algorithm . .. . . . . . . . . . . . . . . . . . . . . . 53

3.2.5 CTS gene selection for new subject based on historical data . . . . . . . 54

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3.1 CTS genes do not consistently appear across samples . . . . . . . . . . . . . 56

3.3.2 CTS genes with different characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3.3 Comparison between Wilcoxon rank-sum test method and the

proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.3.4 Consistent CTS genes can improve performance of downstream

analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Benchmark of Methods Designed for Rare Cell Population Identifi-

cation in Single Cell RNA Sequencing Data 76

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.1 Data simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.2 Benchmark and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.1 Synthetic data can well capture differential signal pattern in

real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.2 Performance of methods when only one RCP exists . . . . . . . . . . . . . . . . 97

4.3.3 Performance of methods when multiple RCPs exist . . . . . . . . . . . . . . . . 101

4.3.4 Computation efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5 Summary and future research plan 107

5.1 Summary .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.2 Future research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Appendix A Appendix for Chapter 2 111

A.1 Evaluation of CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A.2 Cell-type-specific differential methylation in brain . .. . . . . . . . . . . . . . . . . 112

A.3 Cell-type-specific differential methylation in whole blood . .. . . . . . . . . . 113

A.4 Cell-type-specific differential methylation in RA EWAS study . . . . . . . . . 114

A.5 Additional real data analysis showing DE/DM state correlations among

cell types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A.6 Additional simulation analysis evaluating impact of data noise on ob-

served FDR for CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.7 Additional simulation analysis evaluating impact of mis-specified tree

structures as input of CeDAR-M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

A.8 Additional real data analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.8.1 Cell-type-specific differential methylation in Down syndrome

study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.8.2 Cell-type-specific differential methylation in Systemic Lupus

Erythematosus study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

A.8.3 Cell-type-specific differential methylation analysis for smoking

associated DNA methylation sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Appendix B Appendix for Chapter 3 145

B.1 A more general framework for different types marker identification . . . . 145

B.2 Standard error calculation for estimated log2 fold change in one sample 146

B.3 EM algorithm details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

B.3.1 Details in step1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

B.3.2 Details in step3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Appendix C Appendix for Chapter 4 155

C.1 Details of benchmark pipeline for each method . . . . . . . . . . . . . . . . . . . . . . . . 155

C.1.1 Seurat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

C.1.2 RaceID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

C.1.3 CellSIUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

C.1.4 EDGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

C.1.5 GapClust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

C.1.6 FiRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

C.1.7 CIARA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

C.1.8 MicroCellClust (MCC1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

C.1.9 SCISSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

C.1.10 SCMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

C.1.11 scAIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

C.1.12 GiniClust3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

C.1.13 SCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

C.1.14 DoRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Bibliography 164 

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Parola chiave
Committee Chair / Thesis Advisor
Committee Members
Ultima modifica Preview image embargoed

Primary PDF

Supplemental Files