Design Strategies for Studies Using Logistic Regression to Analyze Data on Pooled Samples Open Access

Yan, Xiaobo (Spring 2021)

Permanent URL:


We review logistic regression modeling to estimate the risk of potential factors' odds ratios and predict disease prevalence using pooled samples via the maximum likelihood (ML) approach. We determine the preferred methods to deal with either categorical variables or continuous variables. For categorical variables, random pooling within subsets stratified by the variables of interest yields the most accurate and most efficient estimate on both coefficient and prevalence. We take advantage of statistical software for continuous variables to pool samples with a prespecified number of pools by the k-means clustering algorithm to optimize the estimation performance. We also modify the k-means clustering function embedded in SAS to constrain the maximum pool size to consider laboratory operability and test limitation. We compare the estimates between incorporating perfect and imperfect testing (sensitivity and sensitivity) to demonstrate the necessity of adjustment ML for test bias. Both of our proposed strategies showed the most efficacy while keeping good performance accuracy for the Malaria data and simulated data. Further potential study on imperfect tests is also discussed at the end of the study.

Table of Contents

Table of Contents

1     Introduction

2     Methodology

2.1      Standard multiple logistic regression

2.2      Pooling strategies

2.3      Logistic regression in the pooling setting

3     Results

3.1      Motivational study

3.1.1    Age as a categorical variable

3.1.2    Age as a continuous variable

3.2      Simulation

3.2.1    Age as a categorical variable

3.2.2    Age as a continuous variable

4     Discussion

4.1      Imperfect tests

4.2      Overall prevalence

4.3      Investigating implausible simulation results

5 References

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files