A new method of network-guided dimension reduction Open Access

Hu, Jiani (2016)

Permanent URL: https://etd.library.emory.edu/concern/etds/bg257f58h?locale=en
Published

Abstract

High throughput technologies transform the interest of single-gene and protein studies to the genome scale. Given the size and complexity of high-throughput data, dimension reduction is often used to simplify and visualize data. However, one obstacle to effective dimension reduction of complex gene expression matrix is the loss of true biological information caused by the pervasive correlation and interference of high measurement noise. To address these issues, we tried to incorporate existing knowledge as represented by known biological networks, by developing a new network-guided dimension reduction method. The effectiveness of this method was tested in both simulations and real gene expression data. The simulation results show the power of detecting major signal in large-scale network is high. The results from the real data analysis show the first few dimensions found by the method are dominated by meaningful biological signals. The network-guided dimensional reduction is an effective method that captures the main signals contained in the large data matrix.

Table of Contents

1. Introduction 1

2 Methods 4

2.1 Scale-free gene network simulation 5

2.2 Gene expression simulation 7

Table 1 Parameters in simulations 7

2.3 Network Guided Dimension Reduction 8

2.4 Sparse Principal component analysis (SPCA) and canonical correlation calculation 9

2.5 Test on yeast cycle data 9

3 Results 11

3.1 Simulation results 11

3.2 Testing the method on yeast cycle data 12

4 Discussion 13

References 15

5 Appendices 17

A Figures & Tables 17

Figure 1 Detected Correlation of hub one 17

Figure2 Detected Correlation of hub two 17

Figure 3 Factor score of real yeast cell cycle data captured by the first PC 18

Figure 4 Factor score of real yeast cell cycle data captured by the second PC 18

Table 2 Gene Ontology Classification result of Gene signal captured by the second PC 19

Figure 5 Factor score of real yeast cell cycle data captured by the third PC 20

Table 3 Gene Ontology Classification result of Gene signal captured by the third PC 20

B R Code 21

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files