Federated Learning on Graphs: New Scenarios, Challenges, and Methods Restricted; Files Only

Xie, Han (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/vd66w133c?locale=fr
Published

Abstract

Federated Learning (FL) has emerged as a distributed learning paradigm that facilitates collaborative model training across distributed data sources in consideration of data privacy, regularization, commercial competition, etc. While most existing research predominantly delves into FL within the Euclidean domain, such as text and images, recent advancements in graph representation learning (e.g., Graph Neural Networks) have converged to foster the evolution of Federated Graph Learning (FGL). The innovative intersection has garnered steadily increasing attention as it aligns with numerous real-world scenarios wherein participating clients preserve their private graph-structured information and seek collaborative model training. In contrast to prior FL endeavors that primarily focus on Euclidean data, the extension of FL to graph-structured data introduces distinct challenges, which necessitate specialized treatments.

To systematically study the novel paradigm of FGL, my dissertation categorizes new scenarios of FGL into three schemas following the natural graph distribution among participating clients: 1) Graph-level FGL, 2) Subgraph-level FGL (with Node-level as a special case), and 3) Link-level FGL. It delves into the unique challenges inherent in these schemas, develops tailored methodologies to address them, and further explores practical applications. In essence, my dissertation presents a comprehensive and in-depth exploration of FGL from diverse perspectives, offering valuable insights and innovative contributions to advance this emerging field. 

Table of Contents

1 Introduction 1

1.1 Motivation................................. 1

1.2 Research Contributions.......................... 4

1.2.1 Graph-level Federated Graph Learning . . . . . . . . . . . . . 5

1.2.2 Subgraph-level Federated Graph Learning . . . . . . . . . . . 6

1.2.3 Link-level Federated Graph Learning . . . . . . . . . . . . . . 8

1.3 Organization ............................... 11

2 Background 12

2.1 Basic FL algorithm– FedAvg....................... 12

2.2 Graph Neural Networks ......................... 13

3 Graph-level Federated Graph Learning 15

3.1 GCFL: Federated Graph Classification over Non- IID Graphs . . . . . 15

3.1.1 Overview ............................. 15

3.1.2 Motivating Study: Non-IID structures and features across clients 16

3.1.3 Problem Formulation....................... 17

3.1.4 Technical Design ......................... 18

3.1.5 GCFL+: improved GCFL based on observation sequences of gradients 20

3.1.6 Evaluation and Analysis ..................... 21

3.1.7 Theoretical Analysis ....................... 27

3.1.8 Related Works .......................... 29

3.2 FedBrain: An Application of GCFL for Cross-Institution Brain Network Analysis............................... 32

3.2.1 Overview ............................. 32

3.2.2 Proposed Methods ........................ 33

4 Subgraph-level Federated Graph Learning 41

4.1 FedSCem: Federated Node Classification over Distributed Ego-Networks with Secure Contrastive Embedding Sharing . . . . . . . . . . . . . . 41

4.1.1 Overview ............................. 41

4.1.2 Motivating Study......................... 42

4.1.3 Problem Formulation....................... 44

4.1.4 Proposed Methods: Node embedding sharing . . . . . . . . . . 47

4.1.5 Proposed Methods: Secure embedding sharing among clients and the server........................... 51

4.1.6 Discussions on Efficiency and Privacy . . . . . . . . . . . . . . 55

4.1.7 Evaluation and Analysis ..................... 57

4.1.8 Related Works .......................... 63

5 Link-level Federated Graph Learning 66

5.1 FedLit: Federated Node Classification over Graphs with Latent Link-type Heterogeneity .................. 66

5.1.1 Overview ............................. 66

5.1.2 Motivating Data Analysis .................... 67

5.1.3 Problem Formulation....................... 70

5.1.4 Proposed Methods ........................ 71

5.1.5 Evaluation and Analysis ..................... 79

5.1.6 Related Works .......................... 85

5.2 GaLM: An motivating application study................ 88

5.2.1 Overview ............................. 88

5.2.2 Problem Formulation....................... 88

5.2.3 LM+GNN: The Backbone of Our Proposed Framework . . . . 92

5.2.4 Proposed Methods: Graph-aware LM pre-training on a large graph corpus ........................... 93

5.2.5 Proposed Methods: GaLM fine-tuning on multiple graph applications............................... 98

6 Conclusion and Future Works 105

6.1 Conclusion................................. 105

6.2 FutureDirections............................. 107

6.2.1 Furthering Federated Graph Learning. . . . . . . . . . . . . . 107

6.2.2 Federated Graph Learning in the Era of Generative AI . . . . 109

Appendix A GCFL 113

A.1 More Motivating Examples and Analysis . . . . . . . . . . . . . . . . 113

A.2 Missing Proofs in Section 3.1.7...................... 116

A.2.1 Proof of Proposition 3.1.1 .................... 116

A.2.2 Proof of Proposition 3.1.2 .................... 117

A.2.3 Proof of Proposition 3.1.3 .................... 118

A.3 Dataset Details .............................. 118

A.4 More Detailed Experiment Results ................... 119

Appendix B FedBrain 122

B.1 Datasets.................................. 122

B.2 Clustering Analysis of Guided Clustering . . . . . . . . . . . . . . . . 123

Appendix C FedSCem 125

C.1 DatasetDetails .............................. 125

C.1.1 DatasetStatistics ....................... 125

C.1.2 Distributions of Ego-networks w.r.t. Graph Size and the number of Labels ..................... 125

Appendix D FedLit 127

D.1 Data Details................................ 127

D.1.1 The Oracle Link-types ...................... 127

D.1.2 Data Statistics .......................... 128

D.2 More In-depth Analysis.......................... 128

Appendix E GaLM 131

E.1 Data Statistics .............................. 131

E.1.1 Data Statistics of Amazon-PQ .................. 131

E.1.2 DataStatistics of Product-Reviews . . . . . . . . . . . . . . 131

E.2 Elapsed Time of Pre-training Strategies. . . . . . . . . . . . . . . . . 132

Bibliography 133 

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification Preview image embargoed

Primary PDF

Supplemental Files