Communication Efficient Distributed Tensor Factorization based on Local SGD for Collaborative Health Data Analysis 公开
Pan, Zhangyi (Spring 2020)
Abstract
Tensor factorization is a useful technique for phenotyping, and has proven to be an effective way to approach massive medical data. We can factorize the electronic health records (EHRs) to discover latent clinical concepts that capture interactions among multiple attributes such as medication and diagnosis. One challenge is how to perform high-throughput tensor fac- torization using EHRs distributed among multiple sites while preserving patient privacy. Federated tensor factorization has been proposed recently in which local sites communicate intermediate factors with differential privacy to a global server without sharing the original data. Existing methods based on Elastic Averaging Stochastic Gradient Descent (EASGD), although has lowered the communication cost by infrequent communications, relies on an auxiliary penalty which leads to inferior converged results. In this thesis, we propose a com- munication efficient approach based on local Stochastic Gradient Descent, where the local sites only communicate with the global server after several iterations of local updates and do not require the auxiliary penalty. Our experiments using real medical dataset show that the proposed approach can simultaneously achieve better accuracy and lower communication cost than the state-of-the-art approaches.
Table of Contents
1 Introduction 1
2 Background and Related Work 3
2.1 Notations ..................................... 4
2.2 TensorFactorization ............................... 4
2.2.1 Tensor................................... 4
2.2.2 MatrixProducts ............................. 5
2.2.3 CPDecomposition ............................ 6
2.3 StochasticGradientDescent ........................... 7
2.4 ConcentratedDifferentialPrivacy ........................ 8
2.5 DistributedTensorFactorization......................... 10
2.5.1 VanillaDistributedSGD......................... 10
2.5.2 ADMMBasedApproaches........................ 10
2.5.3 EA-SGDBasedApproaches ....................... 11
2.6 LocalStochasticGradientDescent........................ 12
3 Proposed Model 15
3.1 Overview...................................... 15
3.2 Algorithm..................................... 17
3.3 WorkerSideUpdate ............................... 18
3.3.1 PatientFactorMatrix .......................... 18
3.3.2 FeatureFactorMatrix .......................... 19
3.4 ServerSideUpdate ................................ 20
3.5 PrivacyAnalysis ................................. 21
4 Experiments 22
4.1 Dataset ...................................... 22
4.1.1 DataDescription ............................. 22
4.1.2 Pre-processing............................... 23
4.2 ImplementationDetails.............................. 25
4.3 Baselines...................................... 25
4.4 Parameters .................................... 25
4.4.1 l2,1regularizationtermμ......................... 26
4.4.2 Learningrateη .............................. 26
4.4.3 NumberofSitesT ............................ 26
4.4.4 NumberofLocalUpdatesperCommunicationb . . . . . . . . . . . . 26
4.5 Results....................................... 27
4.5.1 Accuracy ................................. 27
4.5.2 CommunicationCosts .......................... 28
4.5.3 Utility................................... 28
4.5.4 Convergence................................ 29
5 Conclusion and Future Work 31
5.1 Conclusion..................................... 31
5.2 FutureWork.................................... 31
Bibliography 33
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Communication Efficient Distributed Tensor Factorization based on Local SGD for Collaborative Health Data Analysis () | 2020-04-24 19:03:19 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|