Deciphering the Cell Type Specific Activities from High-throughput Omics Data Restricted; Files Only
Chen, Luxiao (Spring 2023)
Abstract
There are hundreds of cell types in the human body carrying different functions. Understanding the cell type specific (CTS) activities will greatly enhance our knowledge on the biological and clinical mechanisms. The advancements in bulk and single cell high-throughput omics technologies enable us to study the CTS effects from the genomics perspective.
Bulk high-throughput omics data contain signals from a mixture of cell types. Recent developments of deconvolution methods facilitate CTS inferences from bulk data. Our real data exploration suggests that differential expression or methylation status is often correlated among cell types. Based on this observation, we developed a novel statistical method named CeDAR to incorporate the cell type hierarchy in CTS differential analyses of bulk data. Extensive simulation and real data analyses demonstrate that this approach significantly improves the accuracy and power in detecting CTS differential signals compared with existing methods, especially in low-abundance cell types.
Single cell RNA-seq (scRNA-seq) allows scientists to study gene expression profile of individual cells in one sample. The increasing interest to apply this technique at population level has facilitated appearing of many datasets containing multiple subjects measured by scRNA-seq. In the real scRNA-seq data, we observed that CTS genes may not consistently appear across all subjects, while they are expected to appear consistently. Motivated by this observation, we first designed a statistical model to identify CTS genes that consistently appear in population-level scRNA-seq data. We then designed a strategy to incorporate these consistent CTS genes identified from historical data into analyses like cell-typing. Data analyses demonstrate that the proposed method and strategy can well identify consistent CTS genes and improve downstream analysis performance.
In scRNA-seq data, cells from extremely low-abundance cell types are called rare cell population (RCP), which plays great roles in biological activities. Because its low abundance, traditional clustering methods can hardly identify it. To correctly identify RCPs in scRNA-seq data, methods with different focuses have been developed. This provides great opportunity for RCP studies; meanwhile, it also makes users difficult to choose. Thus, we summarized these methods and benchmarked them with simulated data to provide comprehensive evaluation with different metrics.
Table of Contents
1 Introduction 1
1.1 Cell type specificity in biological activities . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Cell type specificity analysis with omics data. . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Brief introduction to some types of omics data . . . . . . . . . . . . . . . . . 3
1.2.2 Cell type specificity analysis with bulk omics data . . . . . . . . . . . . . . 5
1.2.3 Cell type specificity analysis with scRNA-seq data . . . . . . . . . . . . . . 7
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Incorporating cell type hierarchy improves cell type-specific differ-
ential analyses in bulk omics data 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Methods overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 The CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.5 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Strong correlations of DE/DM states among cell types are ob-
served in real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.3 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 36
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Investigating the cell type specific genes from population-level single-
cell RNA-seq 45
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Subject-level summary statistics representing cell type speci-
ficity of genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.2 A hierarchical model for CTS genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.3 Identification of CTS genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.4 Parameters estimation with EM algorithm . .. . . . . . . . . . . . . . . . . . . . . . 53
3.2.5 CTS gene selection for new subject based on historical data . . . . . . . 54
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 CTS genes do not consistently appear across samples . . . . . . . . . . . . . 56
3.3.2 CTS genes with different characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.3 Comparison between Wilcoxon rank-sum test method and the
proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.4 Consistent CTS genes can improve performance of downstream
analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Benchmark of Methods Designed for Rare Cell Population Identifi-
cation in Single Cell RNA Sequencing Data 76
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.1 Data simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.2 Benchmark and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.1 Synthetic data can well capture differential signal pattern in
real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2 Performance of methods when only one RCP exists . . . . . . . . . . . . . . . . 97
4.3.3 Performance of methods when multiple RCPs exist . . . . . . . . . . . . . . . . 101
4.3.4 Computation efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5 Summary and future research plan 107
5.1 Summary .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Future research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Appendix A Appendix for Chapter 2 111
A.1 Evaluation of CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.2 Cell-type-specific differential methylation in brain . .. . . . . . . . . . . . . . . . . 112
A.3 Cell-type-specific differential methylation in whole blood . .. . . . . . . . . . 113
A.4 Cell-type-specific differential methylation in RA EWAS study . . . . . . . . . 114
A.5 Additional real data analysis showing DE/DM state correlations among
cell types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.6 Additional simulation analysis evaluating impact of data noise on ob-
served FDR for CeDAR method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.7 Additional simulation analysis evaluating impact of mis-specified tree
structures as input of CeDAR-M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.8 Additional real data analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.8.1 Cell-type-specific differential methylation in Down syndrome
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.8.2 Cell-type-specific differential methylation in Systemic Lupus
Erythematosus study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A.8.3 Cell-type-specific differential methylation analysis for smoking
associated DNA methylation sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Appendix B Appendix for Chapter 3 145
B.1 A more general framework for different types marker identification . . . . 145
B.2 Standard error calculation for estimated log2 fold change in one sample 146
B.3 EM algorithm details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
B.3.1 Details in step1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
B.3.2 Details in step3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Appendix C Appendix for Chapter 4 155
C.1 Details of benchmark pipeline for each method . . . . . . . . . . . . . . . . . . . . . . . . 155
C.1.1 Seurat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
C.1.2 RaceID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
C.1.3 CellSIUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
C.1.4 EDGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
C.1.5 GapClust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
C.1.6 FiRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
C.1.7 CIARA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
C.1.8 MicroCellClust (MCC1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
C.1.9 SCISSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
C.1.10 SCMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
C.1.11 scAIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
C.1.12 GiniClust3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
C.1.13 SCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
C.1.14 DoRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Bibliography 164
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
File download under embargo until 22 May 2025 | 2023-04-27 23:24:08 -0400 | File download under embargo until 22 May 2025 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|