Communication Efficient Distributed Tensor Factorization based on Local SGD for Collaborative Health Data Analysis Public

Pan, Zhangyi (Spring 2020)

Permanent URL: https://etd.library.emory.edu/concern/etds/8049g613n?locale=fr
Published

Abstract

Tensor factorization is a useful technique for phenotyping, and has proven to be an effective way to approach massive medical data. We can factorize the electronic health records (EHRs) to discover latent clinical concepts that capture interactions among multiple attributes such as medication and diagnosis. One challenge is how to perform high-throughput tensor fac- torization using EHRs distributed among multiple sites while preserving patient privacy. Federated tensor factorization has been proposed recently in which local sites communicate intermediate factors with differential privacy to a global server without sharing the original data. Existing methods based on Elastic Averaging Stochastic Gradient Descent (EASGD), although has lowered the communication cost by infrequent communications, relies on an auxiliary penalty which leads to inferior converged results. In this thesis, we propose a com- munication efficient approach based on local Stochastic Gradient Descent, where the local sites only communicate with the global server after several iterations of local updates and do not require the auxiliary penalty. Our experiments using real medical dataset show that the proposed approach can simultaneously achieve better accuracy and lower communication cost than the state-of-the-art approaches. 

Table of Contents

1 Introduction 1

2 Background and Related Work 3

2.1 Notations ..................................... 4

2.2 TensorFactorization ............................... 4

2.2.1 Tensor................................... 4

2.2.2 MatrixProducts ............................. 5

2.2.3 CPDecomposition ............................ 6

2.3 StochasticGradientDescent ........................... 7

2.4 ConcentratedDifferentialPrivacy ........................ 8

2.5 DistributedTensorFactorization......................... 10

2.5.1 VanillaDistributedSGD......................... 10

2.5.2 ADMMBasedApproaches........................ 10

2.5.3 EA-SGDBasedApproaches ....................... 11

2.6 LocalStochasticGradientDescent........................ 12

3 Proposed Model 15

3.1 Overview...................................... 15

3.2 Algorithm..................................... 17

3.3 WorkerSideUpdate ............................... 18

3.3.1 PatientFactorMatrix .......................... 18

3.3.2 FeatureFactorMatrix .......................... 19

3.4 ServerSideUpdate ................................ 20

3.5 PrivacyAnalysis ................................. 21

4 Experiments 22

4.1 Dataset ...................................... 22

4.1.1 DataDescription ............................. 22

4.1.2 Pre-processing............................... 23

4.2 ImplementationDetails.............................. 25

4.3 Baselines...................................... 25

4.4 Parameters .................................... 25

4.4.1 l2,1regularizationtermμ......................... 26

4.4.2 Learningrateη .............................. 26

4.4.3 NumberofSitesT ............................ 26

4.4.4 NumberofLocalUpdatesperCommunicationb . . . . . . . . . . . . 26

4.5 Results....................................... 27

4.5.1 Accuracy ................................. 27

4.5.2 CommunicationCosts .......................... 28

4.5.3 Utility................................... 28

4.5.4 Convergence................................ 29

5 Conclusion and Future Work 31

5.1 Conclusion..................................... 31

5.2 FutureWork.................................... 31

Bibliography 33 

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files