Statistical Methods for Disease Surveillance Based on Multiple Data Streams Pubblico
Zhang, Yuzi (Summer 2023)
Abstract
Disease surveillance systems are widely implemented to monitor diseases distribution and detect outbreaks. An important task of disease surveillance is to infer the number of prevalent or cumulative incident cases. When there are multiple disease surveillance systems in operation for monitoring the same disease among essentially closed populations, the capture-recapture (CRC) approach is an appealing tool used for integrating information across the systems to estimate the total number of diseased cases. We first develop a hierarchical modeling framework for analyzing individual-level surveillance data collected from multiple surveillance systems at multiple surveillance sites that allows for individual-level heterogeneity in capture probabilities, and borrows information across surveillance sites to improve the estimation of disease case counts. Second, we propose an accessible sensitivity and uncertainty analysis using a multinomial distribution-based maximum likelihood estimation (MLE) procedure that hinges on a key inestimable parameter for two-catch CRC experiments. Under this multinomial model, we also derive bias-corrected estimators which allow for any user-specified level of the dependency between two systems. We next clarify some crucial pitfalls of the popular log-linear model-based approach to CRC estimation. Finally, motivated by those pitfalls, we develop an alternative framework again under the multinomial distribution-based model, and hinging on the choice of a key parameter that reflects dependences among surveillance systems. This alternative framework leverages generalizations of the closed-form estimator derived in the sensitivity and uncertainty analysis framework, and extends the associated bias correction procedures to incorporate CRC studies involving an arbitrary number of systems. Under the alternative framework, we show how expert opinion can be incorporated in the spirit of prior information to guide estimation in an appealing and transparent way, and how an adapted credible interval approach can be used to facilitate inference exhibiting favorable frequentist properties. By generalizing the idea in the proposed uncertainty analysis targeting for two-catch cases, the proposed framework permits principled uncertainty analyses via which a user can acknowledge his/her level of confidence in assumptions made about the key dependency parameter.
Table of Contents
1 Introduction 1
1.1 IntroductiontoDiseaseSurveillance................... 2
1.2 Capture-Recapture Methods in Disease Surveillance . . . . . . . . . . 3
1.2.1 Datastructure .......................... 3
1.2.2 Multinomialmodels........................ 5
1.2.3 Poissonmodels .......................... 6
1.2.4 Models allowing individual heterogeneity in capture probabilities 7
1.3 SpecificAims ............................... 7
2 A Hierarchical Model for Analyzing Multi-Site Individual-Level Dis- ease Surveillance Data 9
2.1 Background ................................ 10
2.2 MotivatingData ............................. 11
2.3 Methods.................................. 13
2.3.1 ModelSpecification........................ 13
2.3.2 A Two-Stage Bayesian Procedure for Inference . . . . . . . . . 17
2.3.3 First-StageEstimation ...................... 17
2.3.4 Second-StageEstimation..................... 19
2.3.5 PriorDistributions ........................ 20
2.4 SimulationStudies ............................ 20
2.4.1 SimulationDesign......................... 21
2.4.2 ComparisonwithOne-StageBMModel. . . . . . . . . . . . . 22
2.4.3 PositiveDependencebetweenSystems . . . . . . . . . . . . . 23
2.4.4 BenefitsofMultipleActiveSystems............... 23
2.5 Application ................................ 30
2.6 Discussion................................. 34
3 Sensitivity and Uncertainty Analysis for Two-Stream CRC Methods 37
3.1 Background ................................ 38
3.2 Methods.................................. 40
3.2.1 MaximumLikelihoodEstimators ................ 40
3.2.2 SensitivityAnalysis........................ 44
3.2.3 UncertaintyAnalysis ....................... 47
3.2.4 Sensitivity Analysis with A Known Case Ratio . . . . . . . . . 50
3.3 SimulationStudies ............................ 52
3.4 Discussion................................. 53
4 Pitfalls of the Log-linear Modeling Framework for CRC Studies 59
4.1 Background ................................ 60
4.2 MotivatingData ............................. 62
4.3 MLEsofN withaGivenKeyDependencyParameter . . . . . . . . . 62
4.4 The Exclusionary Property of CRC Log-linear Models . . . . . . . . . 64
4.4.1 AToyExample.......................... 65
4.5 AICisDeceivingasaMetricforCRCModelSelection . . . . . . . . 73
4.6 Discussion................................. 80
5 A CRC Modeling Framework for Disease Surveillance Emphasizing Expert Opinion in the Spirit of Prior Information 84
5.1 Background ................................ 85
5.2 Methods.................................. 86
5.2.1 Preliminaries ........................... 86
5.2.2 Proposedmodelingframework.................. 89
5.3 Simulations ................................ 96
5.4 RealDataApplications.......................... 104
5.4.1 Three-streamHIVCRCdata .................. 104
5.4.2 Four-streamHIVCRCdata ................... 106
5.5 Discussion................................. 107
6 Summary and Future Work 112
6.1 Summary ................................. 113
6.2 FutureWork................................ 114
Appendix A Appendix for Chapter 2 116
A.1 Posterior predictive simulation procedure for generating imputed dataset116
A.2 Datagenerationprocedure........................ 117
A.2.1 Two systems are independent at the population level . . . . . 117
A.2.2 Two systems are positively correlated at the population level . 118
A.2.3 Multipleactivesystemsareincluded . . . . . . . . . . . . . . 118
A.3 Estimation and inference for two independent two-stream CRC data . 119
A.4 Goodness of fit of the proposed model for analyzing PTB data . . . . 121
Appendix B Appendix for Chapter 3 125
B.1 Conditional multinomial model for population-level two-stream CRC data .................................... 125
B.2 Derivation of bias-corrected estimators under two-stream CRC . . . . 125
B.3 Varianceestimators............................ 129
B.4 Procedure for obtaining 95% percentile interval for N . . . . . . . . . 130
B.5 Crossing points of sensitivity plots obtained from two strata . . . . . 131
B.6 MLEswithaknowncaseratio...................... 132
B.7 MLEsunderthree-streamCRC ..................... 133
Appendix C Appendix for Chapter 4 135
Appendix D Appendix for Chapter 5 137
D.1 Dirichlet-multinomial-based approach for inference . . . . . . . . . . . 137
D.2 Uncertaintyanalysis ........................... 138
D.3 Simulationsettings ............................ 139
D.4 Log-linear models fitted in simulation studies with results presented in Tables5.2and5.3............................. 142
Bibliography
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Statistical Methods for Disease Surveillance Based on Multiple Data Streams () | 2023-07-25 11:36:00 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|