Consensus clustering of subclone structure for multi-sample sequencing data Open Access
Zheng, Hanyi (Spring 2020)
Background: The tumor heterogeneity describes the heterogeneity in morphology and phenotype in tumor cells and is related to cancer therapeutics. The accurate assessment of tumor heterogeneity is an essential step for understanding how a tumor evolves and the determination of tumor subpopulation is a challenge. In this work, we present a combinatorial algorithm that can exploit samples from multiple time points over the development of the tumor within a single patient to determine the subclone cluster.
Methods: We firstly estimated CCF (cancer cell fraction) and cluster information for each time point by implementing a hierarchical Bayes statistical model and MCMC process (Pyclone). After the imputation of co-clustering matrix, we used non-negative sparse coding to determine consensus cluster across all time points to avoid trivial cluster. Finally, we made adjustment to the covariance matrix and used BIC to decide the optimal number of clusters.
Results: We use weighted CCF as the CCF for the cluster and observe the trend of each cluster. For PR42, k=5 is the optimal cluster number and every cluster has a unique trend. For PR44, the whole trend for mutations goes down then goes up, which implies that the therapy does well at first but then lost its effect. For PR240 we found that the therapy is ineffective for this patient at all since the trend of CCF for all clusters across all K increases with time.
Conclusions: This study presents a combinatorial algorithm to decide the subclone cluster of multi-timepoints tumor gene data. The model works well when data does not have a high percentage of missing mutations. Besides, the purity of the sample and the trivial clusters generated by Pyclone can affect the results. We also found that missing mutations directly impact the co-clustering matrix and covariance matrix in the BIC step.
Table of Contents
Table of Contents
2 Methods. 4
2.1 Data Collection and Cleaning. 4
2.2 Pyclone. 4
2.3 Co-Clustering matrix imputation. 6
2.4 Non-negative sparse coding. 8
2.5 Deciding optimal number of clusters. 8
2.5.1 Covariance matrix adjustment 9
2.5.2 likelihood function. 10
3. Result 11
3.1 Data summary. 11
3.2 Result of Pyclone. 12
3.3 Consensus cluster 13
3.3.1 weighted CCF. 13
3.3.2 PR42 result 14
3.3.3 PR44 result 16
3.3.4 PR240 result 18
3.3.5 PR246. 19
4 Discussion. 21
About this Master's Thesis
|Subfield / Discipline|
|Committee Chair / Thesis Advisor|
|Consensus clustering of subclone structure for multi-sample sequencing data ()||2020-04-20 00:43:03 -0400||