A benchmark of rare cell type detection methods for single-cell RNA sequencing data Open Access
Liu, Jiahui (Spring 2023)
Abstract
Background: A key task in single-cell RNA-seq (scRNA-seq) data analysis is to detect the rare cell types in the sample, which can be critical for downstream analyses such as differential gene analysis. Various scRNA-seq data detecting rare cell type algorithms have been specifically designed to automatically estimate the rare cell types through define rareness score or optimizing the clustering method. The lack of benchmark studies, however, complicates the choice of the methods.
Results: We conducted a comprehensive evaluation of several widely used algorithms for detecting rare cell types. To assess their accuracy and consistency, we sampled data from European Genome-Phenome Archive (EGA) and evaluated their performance on a range of scRNA-seq datasets with different samples. Additionally, we integrated multiple samples to test the algorithms' population-level performance. Using a set of criteria, including clustering improvement methods and customization of the rareness score, we evaluated the algorithms' performance from various aspects and drew our conclusions based on this benchmarking work. Our evaluation was based on a large number of datasets, providing us with valuable insights into the suitability of these algorithms for identifying rare cell types.
Conclusion: We identified the strengths and weaknesses of each method based on a variety of criteria, including detection accuracy, precision, Cohen's kappa, sensitivity, and specificity at the individual and population levels based on predefined rare cell types, as well as a comparison of runtime and peak memory. We then aggregate these results into multifaceted recommendations for users.
Table of Contents
Table of Contents
1. Introduction 1
2. Methods 2
2.1 Data collection and preprocessing 2
2.2 Batch Effect Correction 3
2.3 Rare Cell Types Detecting 4
2.3.1 Detecting rare cell type based on Gini and Fano index (GiniClust3) 5
2.3.2 Detecting rare cell types based on correlated gene with MCL (CellSIUS) 5
2.3.3 Detecting rare cell type based on calculating rareness score (FiRE) 6
2.3.4 Detecting rare cell type based on embedding and RPH-kmeans (scAIDE) 7
2.3.5 Detecting rare cell types based on screened for outliers (RaceID) 8
2.4 Evaluation metrics 9
2.5 Uniform manifold approximation and projection (UMAP) visualization 10
2.6 Computation evaluation of runtime 10
3. Results 10
3.1 Data cleaning and batch effect correction result 10
3.2 Detecting rare cell types using GiniClust3 12
3.3 Detecting rare cell types in CellSIUS 16
3.4 Detecting rare cell types in scAIDE 20
3.5 Detecting rare cell types in FiRE 24
3.6 Detecting rare cell types in RaceID 28
3.7 Computing time benchmarks 33
4. Discussion 33
4.1 Rare Cell type detecting methods 33
4.2 Runtime and memory evaluation 35
5. Conclusion: 35
Reference: 36
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
A benchmark of rare cell type detection methods for single-cell RNA sequencing data () | 2023-04-13 13:52:07 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|