Federated Tensor Factorization for Collaborative Health Data Analytics 公开

Ma, Jing (Summer 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/3j333365w?locale=zh
Published

Abstract

Modern healthcare systems are collecting a huge volume of healthcare data from a large number of individuals with various medical procedures, medications, diagnosis, lab tests and so on. Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. However, the EHR data is also fragmented and is always distributed among independent medical institutions, and they are prohibited from being shared and exchanged. Recently, federated learning offers a paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. This poses challenges to preserving the privacy of the exchanged intermediary results in order to protect the sensitive patient information. Meanwhile, efforts still need to be made to overcome the limitations of the federated tensor factorization, including the restrictions to the classic tensor factorization, high communication cost and reduced accuracy. Furthermore, it is essential to develop the decentralization techniques for federated tensor factorization to deal with the vulnerability of the central server to malfunction and external attacks. To deal with these challenging problems, we propose 1) a privacy-preserving collaborative tensor factorization method for computational phenotyping which is able to deal with heterogeneous data with rigorous privacy guarantee and achieves less communication cost and comparable accuracy; 2) a communication efficient federated generalized tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice; 3) a communication efficient decentralized generalized tensor factorization method which enables the absence of the central server and further reduces the communication cost.

Table of Contents

1 Introduction 1

1.1 Motivations ................................ 1

1.2 Research Contributions.......................... 4

1.2.1 Privacy-preserving Efficient Federated Tensor Factorization . . 5

1.2.2 Federated Generalized Tensor Factorization with Improved Communication Efficiency....................... 6

1.2.3 Decentralized Communication-efficient Generalized Tensor Factorization ............................. 7

1.3 Organization ............................... 7

2 Privacy-preserving Efficient Federated Tensor Factorization 8

2.1 Introduction................................ 8

2.2 Preliminaries and Backgrounds ..................... 10

2.2.1 Tensor Factorization ....................... 10

2.2.2 Differential Privacy........................ 11

2.2.3 Concentrated Differential Privacy ................ 12

2.2.4 Related Works .......................... 14

2.3 DPFact .................................. 15

2.3.1 Overview ............................. 15

2.3.2 Formulation............................ 17

2.3.3 Heterogeneous Patient Populations . . . . . . . . . . . . . . . 18

2.4 DPFact Optimization........................... 19

2.4.1 Local Factors Update....................... 21

2.4.2 Privacy Analysis ......................... 24

2.4.3 Global Variables Update..................... 25

2.5 Experimental Evaluation......................... 26

2.5.1 Experimental Settings ...................... 26

2.5.2 Result Analysis .......................... 30

3 Federated Generalized Tensor Factorization with Improved Communication Efficiency 35

3.1 Introduction................................ 35

3.2 Preliminaries and Background...................... 38

3.2.1 Notation.............................. 38

3.2.2 Generalized Tensor Factorization ................ 38

3.2.3 SGD with Gradient Compression, Error-Feedback and Periodic Communication.......................... 40

3.3 Proposed Methods ............................ 41

3.3.1 FedGTF-EF: Communication Efficient GTF with Block Randomization, Gradient Compression and Error-Feedback . . . . 41

3.3.2 FedGTF-EF-PC: Further Communication Reduction by Periodic Communication.......................... 43

3.3.3 Efficient Partial Stochastic Gradient Computation for FedGTF 45

3.4 Algorithm Analysis............................ 46

3.4.1 Convergence Analysis....................... 46

3.4.2 Complexity Analysis ....................... 51

3.4.3 Privacy Analysis ......................... 52

3.5 Experiment ................................ 52

3.5.1 Experimental Setup........................ 52

3.5.2 Experiment results ........................ 54

4 Decentralized Communication-efficient Generalized Tensor Factorization 58

4.1 Introduction................................ 58

4.2 Preliminaries and Background...................... 60

4.2.1 Notations and Operators..................... 61

4.2.2 Communication Reduction.................... 61

4.2.3 Decentralized Optimization ................... 62

4.3 Proposed Method............................. 63

4.3.1 Problem Formulation....................... 63

4.3.2 CiderTF: Decentralized Generalized Tensor Factorization with

Compressed, Block-randomized, Periodic, and Event-triggered

Communication.......................... 66

4.3.3 CiderTF_m: CiderTF with Nesterov’s momentum . . . . . . . . 69

4.3.4 Complexity Analysis ....................... 69

4.4 Experiment ................................ 71

4.4.1 Experimental Settings ...................... 71

4.4.2 Result Analysis .......................... 73

4.4.3 Case Study on MIMIC-III .................... 78

5 Conclusion and Future Work 83

5.1 Conclusion................................. 83

5.2 Future Work................................ 84

Appendix A DPFact 86

A.1 Lipschitz Constant............................ 86

A.2 L2,1RegularizationParameter...................... 87

A.3 Phenotype Selection ........................... 88

Appendix B FedGTF-EF/FedGTF-EF-PC 90

B.1 Additional Materials for Experiments.................. 90

B.1.1 Parameter Settings ........................ 90

B.1.2 Additional Experiments ..................... 91

B.2 Additional Materials for Convergence Analysis of Algorithm 1 . . . . 92

B.2.1 Proof of Theorem4.1....................... 92

B.2.2 Proof of Theorem4.2....................... 99

B.3 Additional Materials for Convergence Analysis of Algorithm 2 . . . . 106

B.3.1 Proof of Theorem4.3....................... 106

B.4 Additional Materials for Complexity Analysis of Algorithm 1 & 2 . . 112

B.4.1 Proof of Theorem 4.4 (Computational Complexity) . . . . . . 112

B.4.2 Proof of Theorem 4.5 (Uplink Communication Complexity) . . 112

Bibliography 113

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files