EDClust: An EM-MM hybrid method for cell clustering in population-level single cell RNA sequencing Public
Wei, Xin (Spring 2021)
Abstract
Single-cell RNA sequencing (scRNA-seq) technology has revolutionized the genomics research by enabling the measurement of the transcriptomic profile at the level of single cells. One of the most fundamental problems in scRNA-seq data analysis is cell clustering, for which a rather large number of methods have been developed. With the increasing application of scRNA-seq in larger scale studies, people face the problem of cell clustering when the scRNA-seq data are from more than one subject. One challenge in analyzing such data is the subject-specific systematic variations: heterogeneity from multiple subjects may have a significant impact on the clustering accuracy. However, existing methods addressing such effect suffered from several limitations. In this work, we develop a novel statistical method named ‘EDClust’ for scRNA-seq cell clustering when data are from multiple subjects. EDClust models the sequence read counts by a mixture of Dirichlet-Multinomial distributions, and explicitly accounts for the cell type heterogeneity, subject heterogeneity, and the clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to the existing methods.
Table of Contents
1 . Introduction ...................................... 1
2 . Methods ...................................... 3
2.1 Data model ...................................... 3
2.2 The EM-MM hybrid algorithm for maximum likelihood. . . . . . . . . . . . . . . 4
2.3 Feature selection ................................... 7
2.4 Determine the initial values.............................. 7
2.5 Software implementation............................... 8
3 . Simulation studies ...................................... 10
4 . Real data analyses ...................................... 11
4.1 Mouse Retina dataset................................. 12
4.2 Baron Pancreas dataset ................................ 12
4.3 Human Skin dataset.................................. 14
4.4 Mouse Lung dataset.................................. 16
4.5 Computational performance ............................. 17
5 . Discussion ...................................... 17
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
EDClust: An EM-MM hybrid method for cell clustering in population-level single cell RNA sequencing () | 2021-04-26 09:07:06 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|