Exploration of Normalization Methods on Bulk RNA-seq Data and Single-cell RNA-seq Data Open Access
Wang, Yawei (Spring 2021)
Abstract
Background: RNA-seq and single-cell RNA-seq are powerful new technologies in biomedical research. To eliminate the inherent technical errors associated with factors like sequencing depth and gene length, RNA-seq data from different samples need to be normalized so that they are comparable. However, the presence of abundant zeros in the data, especially in single-cell RNA-seq data, makes the normalization effect extremely challenging.
Method and Materials: In the bulk RNA-seq normalization section, I used a novel normalization method, named Group method, and compared its performance with other bulk RNA-seq data normalization methods, Upper Quantile, Quantile, Median, TMM, and DESeq, by calculating Spearman correlation between normalized RNA-seq data and TaqMan qRT-PCR data. We also compared their effectiveness on simulated data and differential expression analysis respectively. For the single-cell RNA-seq part, I merge genes based on the KEGG pathway and use the Quantile method to normalize pathway-cell data, which was named the Pathway-Quantile method. I compared this method with log normalization method, scran, and Linnorm on 3k PBMC data (without spike-in genes) and human pancreas data (with spike-in genes) by using the results after UMAP reducing dimension and Seurat package, version 4.0.1 visualizing.
Results: For simulated and real bulk RNA-Seq data, all normalization methods performed similarly in terms of the Spearman correlation between normalized real RNA-Seq data and MAQC TaqMan qRT-PCR data. And Group method does not perform better compared to other methods. For differential expression analysis, all methods showed similar performance. For single-cell RNA-seq data, Pathway-Quantile is better than pathway-level data, but its performance was inferior to other methods when test on 3k PBMC data.
Conclusion: We found the group method is competitive for normalizing bulk RNA-seq data. However, more studies are needed for normalizing single-cell RNA-seq data using the Group-Quantile method.
Table of Contents
1. Introduction 1
2. Data Source 3
2.1 Bulk RNA-seq data 3
2.1.1 Real data 3
2.1.2 MAQC TaqMan qRT-PCR data 4
2.1.3 simulated data 5
2.2 Single-cell RNA-seq data 5
2.2.1 Peripheral Blood Mononuclear Cell (PBMC) data 5
2.2.2 Single-cell RNA-seq data with spike-ins 5
2.2.3 KEGG pathway gene sets 6
3. Methods 6
3.1 Bulk RNA-seq data normalization methods 6
3.1.1 Traditional normalization methods with real RNA-Seq data 7
3.1.2 Group method 8
3.1.3 Normalization with simulated data 9
3.1.4 Differential expression analysis 9
3.2 Sing-cell RNA-seq normalization methods10
3.2.1 Log Normalization 10
3.2.2 Linear Model and Normality Based Normalizing Transformation Method (Linnorm) 11
3.2.3 Scran method 12
3.2.4 Group-Quantile method 12
4. Results12
4.1 Results of bulk RNA-seq data12
4.2 Results of single-cell RNA-seq data16
4.2.1 PBMC data visualization 16
4.2.2 Spike-in single-cell RNA-seq data visualization 17
5. Conclusion and Discussion18
Reference20
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Exploration of Normalization Methods on Bulk RNA-seq Data and Single-cell RNA-seq Data () | 2021-04-25 23:19:33 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|