Representation Learning on Physical and Information Networks Público

Zhang, Zheng (Fall 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/9k41zg186?locale=pt-BR
Published

Abstract

Networks, encompassing both physical and information networks, are fundamental graph structures for modeling relationships among entities across diverse real-world applications. This thesis aims to advance general representation learning on network data by addressing several key challenges. Traditional graph representation learning methods primarily focus on topological structures, often neglecting the rich data modalities inherent in these networks and lacking theoretical guarantees on expressive power. Additionally, data quality challenges such as label scarcity, noisy data, and incompatibility between graph topology and other modalities hinder the development of robust and generalizable models.

For physical networks, we propose the spatial graph message passing neural network, a novel framework that seamlessly integrates spatial and topological information with theoretical guarantees on discriminative power. We enhance computational efficiency through an accelerated spanning tree sampling algorithm, reducing complexity from O(N^3) to O(N) while maintaining expressive capabilities. Furthermore, we extend the framework to accommodate networks embedded in irregular manifold spaces and generalize it to handle geometric trees, addressing the unique hierarchical structures in such data.

For information networks, we introduce a self-supervised learning framework called text-and-graph multi-view alignment. This framework unifies diverse data domains by leveraging text-attributed graphs, augmenting traditional graph structures with natural language descriptions. This framework incorporates a multi-view alignment module that preserves rich semantic information, topology, and their interplay. An accelerated algorithm reduces training time complexity from quadratic to linear, facilitating scalability to large datasets. We evaluate the framework's performance under label-scarce and transfer learning settings, demonstrating its effectiveness without reliance on extensive labeled data.

To enhance generalizability and robustness, we propose the relational curriculum learning method. This method improves representation learning on network data by addressing incompatibilities between graph topology and other data modalities. It introduces a novel edge selection criterion that quantifies the difficulty of understanding graph edges, incorporating them into the training process at appropriate times. Through extensive experiments on synthetic and real-world datasets, the proposed method demonstrates significant improvements in generalization ability and robustness.

In summary, this thesis presents novel frameworks and algorithms that advance representation learning on both physical and information networks. By providing theoretical guarantees, addressing data quality issues, and enhancing efficiency and scalability, these contributions hold significant implications for various downstream applications in chemistry, biomedicine, social sciences, and beyond.

Table of Contents

1 Introduction

1.1 Research Issues

1.1.1 Representation Learning on Physical Networks

1.1.2 Representation Learning on Information Networks

1.1.3 Enhancing Generalizability and Robustness of Learning Network Representations

1.2 Contribution

1.2.1 Representation Learning on Physical Networks

1.2.2 Representation Learning on Information Networks

1.2.3 Enhancing Generalizability and Robustness of Learning Network Representations

1.3 Organization of Thesis

2 Representation Learning on Physical Networks

2.1 Introduction on Physical Networks

2.2 Related Works

2.3 Representation Learning on Euclidean Spatial Networks

2.3.1 Node Spatial Information Representation

2.3.2 Spatial Graph Message Passing Neural Network

2.3.3 Accelerate Training through Sampling Random Spanning Trees

2.4 Experiments on Euclidean Spatial Networks

2.4.1 Experiment Setup

2.4.2 Experimental Performance

2.5 Representation Learning on Non-Euclidean Spatial Networks

2.5.1 Background on Non-Euclidean Spatial Networks

2.5.2 Generalized Framework for Non-Euclidean Spatial Networks

2.6 Experimental Results on Non-Euclidean Spatial Networks

2.6.1 Experimental Settings

2.6.2 Effectiveness Results

2.6.3 Effect of Manifold Irregularity Analysis

2.6.4 Sensitivity Analysis

2.7 Representation Learning on Spatial Trees

2.7.1 Background on Spatial Trees

2.7.2 Self-Supervised Geometric Tree Representation Learning

2.7.3 Tree Branch Geometric-Topology Information Representation Learning

2.7.4 Hierarchical Relationship Modeling through Partial Ordering Objective Function

2.7.5 Self-Supervised Learning via Subtree Growth Learning

2.8 Experiments on Spatial Trees

2.8.1 Experimental Settings

2.8.2 Effectiveness Results

2.8.3 Transfer Ability Analysis

2.8.4 Ablation Studies

2.9 Conclusion

3 Representation Learning on Information Networks

3.1 Introduction on Information Networks

3.2 Related Works

3.2.1 Text-Attributed Graphs Representation Learning

3.2.2 Unsupervised Graph Pre-Train Methods

3.2.3 Graph2Text Encoding Methods

3.2.4 Efficient and Scalable Methods for Large-Size Graph Neighborhoods

3.3 Self-Supervised Learning Framework on TAGs

3.3.1 Text-and-Graph Multi-View Construction

3.3.2 Represent Text Neighborhood Information via Hierarchical Document Layout

3.3.3 Multi-View Alignment via TAG Hierarchical Self-Supervised Learning

3.3.4 Accelerating Training on Large TAGs with Structure-Preserving Random Walk

3.4 Experiments

3.4.1 Experimental Settings

3.4.2 Effectiveness Results

3.4.3 Transfer Ability Analysis

3.4.4 Ablation Study

3.4.5 Sensitivity Analysis

3.4.6 Efficiency Analysis

3.5 Conclusions

4 Enhancing Generalizability and Robustness of Learning Network Representations

4.1 Introduction

4.2 Related Works

4.3 Relational Curriculum Learning

4.3.1 Preliminaries

4.3.2 Incremental Edge Selection by Quantifying Difficulties of Sample Dependencies

4.3.3 Automatically Control the Pace of Increasing Edges

4.3.4 Smooth Structure Transition by Edge Reweighting

4.4 Experimental Results of RCL

4.4.1 Experimental Settings

4.4.2 Effectiveness Results

4.4.3 Robustness Analysis Against Topological Noise

4.4.4 Ablation Study

4.4.5 Visualization of Learned Edge Selection Curriculum

4.4.6 Effectiveness Experiments on Heterophilic Datasets

4.4.7 Time Complexity Analysis

4.4.8 Parameter Sensitivity Analysis

4.4.9 Visualization of Importance on Smoothing Component

4.4.10 Effectiveness Experiments on PNA Backbone Model

4.4.11 Robustness Experiments on PNA Backbone Model

4.5 Conclusion

5 Conclusions

5.1 Research Contributions

5.1.1 Representation Learning on Physical Networks

5.1.2 Representation Learning on Information Networks

5.1.3 Enhancing Generalizability and Robustness of Learning Network Representations

5.2 Publications and In-Preparation Submissions

5.2.1 Published Works

5.2.2 Submitted and In-preparation Papers

Appendix A Representation Learning on Physical Networks

A.1 Representation Learning on Euclidean Spatial Networks

A.1.1 Proof of Theorem 1

A.1.2 Proof of Lemma 1

A.1.3 Proof of Theorem 3

A.1.4 Proof of Proposition 1

A.1.5 Proof of Proposition 2

A.2 Representation Learning on Non-Euclidean Spatial Networks

A.2.1 Proof of Theorem 4

A.2.2 Proof of Theorem 5

Appendix B Representation Learning on Information Networks

B.1 Additional Experimental Results and Settings

B.1.1 Additional Implementation Settings

B.1.2 Additional Link Prediction Experiments

B.1.3 Additional Node Classification Analysis

B.1.4 Additional Ablation Studies

B.2 Additional Technical Details

B.3 Limitations

Appendix C Enhancing Generalizability and Robustness of Learning Network Representations

C.1 Mathematical Proof for Theorem 6

Bibliography

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palavra-chave
Committee Chair / Thesis Advisor
Committee Members
Última modificação

Primary PDF

Supplemental Files