Federated Learning on Graphs: New Scenarios, Challenges, and Methods Restricted; Files Only
Xie, Han (Spring 2024)
Abstract
Federated Learning (FL) has emerged as a distributed learning paradigm that facilitates collaborative model training across distributed data sources in consideration of data privacy, regularization, commercial competition, etc. While most existing research predominantly delves into FL within the Euclidean domain, such as text and images, recent advancements in graph representation learning (e.g., Graph Neural Networks) have converged to foster the evolution of Federated Graph Learning (FGL). The innovative intersection has garnered steadily increasing attention as it aligns with numerous real-world scenarios wherein participating clients preserve their private graph-structured information and seek collaborative model training. In contrast to prior FL endeavors that primarily focus on Euclidean data, the extension of FL to graph-structured data introduces distinct challenges, which necessitate specialized treatments.
To systematically study the novel paradigm of FGL, my dissertation categorizes new scenarios of FGL into three schemas following the natural graph distribution among participating clients: 1) Graph-level FGL, 2) Subgraph-level FGL (with Node-level as a special case), and 3) Link-level FGL. It delves into the unique challenges inherent in these schemas, develops tailored methodologies to address them, and further explores practical applications. In essence, my dissertation presents a comprehensive and in-depth exploration of FGL from diverse perspectives, offering valuable insights and innovative contributions to advance this emerging field.
Table of Contents
1 Introduction 1
1.1 Motivation................................. 1
1.2 Research Contributions.......................... 4
1.2.1 Graph-level Federated Graph Learning . . . . . . . . . . . . . 5
1.2.2 Subgraph-level Federated Graph Learning . . . . . . . . . . . 6
1.2.3 Link-level Federated Graph Learning . . . . . . . . . . . . . . 8
1.3 Organization ............................... 11
2 Background 12
2.1 Basic FL algorithm– FedAvg....................... 12
2.2 Graph Neural Networks ......................... 13
3 Graph-level Federated Graph Learning 15
3.1 GCFL: Federated Graph Classification over Non- IID Graphs . . . . . 15
3.1.1 Overview ............................. 15
3.1.2 Motivating Study: Non-IID structures and features across clients 16
3.1.3 Problem Formulation....................... 17
3.1.4 Technical Design ......................... 18
3.1.5 GCFL+: improved GCFL based on observation sequences of gradients 20
3.1.6 Evaluation and Analysis ..................... 21
3.1.7 Theoretical Analysis ....................... 27
3.1.8 Related Works .......................... 29
3.2 FedBrain: An Application of GCFL for Cross-Institution Brain Network Analysis............................... 32
3.2.1 Overview ............................. 32
3.2.2 Proposed Methods ........................ 33
4 Subgraph-level Federated Graph Learning 41
4.1 FedSCem: Federated Node Classification over Distributed Ego-Networks with Secure Contrastive Embedding Sharing . . . . . . . . . . . . . . 41
4.1.1 Overview ............................. 41
4.1.2 Motivating Study......................... 42
4.1.3 Problem Formulation....................... 44
4.1.4 Proposed Methods: Node embedding sharing . . . . . . . . . . 47
4.1.5 Proposed Methods: Secure embedding sharing among clients and the server........................... 51
4.1.6 Discussions on Efficiency and Privacy . . . . . . . . . . . . . . 55
4.1.7 Evaluation and Analysis ..................... 57
4.1.8 Related Works .......................... 63
5 Link-level Federated Graph Learning 66
5.1 FedLit: Federated Node Classification over Graphs with Latent Link-type Heterogeneity .................. 66
5.1.1 Overview ............................. 66
5.1.2 Motivating Data Analysis .................... 67
5.1.3 Problem Formulation....................... 70
5.1.4 Proposed Methods ........................ 71
5.1.5 Evaluation and Analysis ..................... 79
5.1.6 Related Works .......................... 85
5.2 GaLM: An motivating application study................ 88
5.2.1 Overview ............................. 88
5.2.2 Problem Formulation....................... 88
5.2.3 LM+GNN: The Backbone of Our Proposed Framework . . . . 92
5.2.4 Proposed Methods: Graph-aware LM pre-training on a large graph corpus ........................... 93
5.2.5 Proposed Methods: GaLM fine-tuning on multiple graph applications............................... 98
6 Conclusion and Future Works 105
6.1 Conclusion................................. 105
6.2 FutureDirections............................. 107
6.2.1 Furthering Federated Graph Learning. . . . . . . . . . . . . . 107
6.2.2 Federated Graph Learning in the Era of Generative AI . . . . 109
Appendix A GCFL 113
A.1 More Motivating Examples and Analysis . . . . . . . . . . . . . . . . 113
A.2 Missing Proofs in Section 3.1.7...................... 116
A.2.1 Proof of Proposition 3.1.1 .................... 116
A.2.2 Proof of Proposition 3.1.2 .................... 117
A.2.3 Proof of Proposition 3.1.3 .................... 118
A.3 Dataset Details .............................. 118
A.4 More Detailed Experiment Results ................... 119
Appendix B FedBrain 122
B.1 Datasets.................................. 122
B.2 Clustering Analysis of Guided Clustering . . . . . . . . . . . . . . . . 123
Appendix C FedSCem 125
C.1 DatasetDetails .............................. 125
C.1.1 DatasetStatistics ....................... 125
C.1.2 Distributions of Ego-networks w.r.t. Graph Size and the number of Labels ..................... 125
Appendix D FedLit 127
D.1 Data Details................................ 127
D.1.1 The Oracle Link-types ...................... 127
D.1.2 Data Statistics .......................... 128
D.2 More In-depth Analysis.......................... 128
Appendix E GaLM 131
E.1 Data Statistics .............................. 131
E.1.1 Data Statistics of Amazon-PQ .................. 131
E.1.2 DataStatistics of Product-Reviews . . . . . . . . . . . . . . 131
E.2 Elapsed Time of Pre-training Strategies. . . . . . . . . . . . . . . . . 132
Bibliography 133
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
File download under embargo until 22 May 2026 | 2024-05-03 23:30:51 -0400 | File download under embargo until 22 May 2026 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|