Assess Improvement of Balancing Covariates by Propensity Score approach using Generalized Boosted Model (GBM) and Application Based on National Cancer Database Público
Song, Haocan (Spring 2018)
Abstract
Background: Observational study is one of the most commonly used study designs in many
medical research, but they have a major limitation of getting vulnerable to selection bias to
make valid causal inference. Propensity score (PS) matching and weighting are popular
methods that can be applied to reduce the bias and estimating causal effects in observational
studies. In this work, we focused on General Boosted Method (GBM), a tree-based approach
to obtain more accurate estimated PS score without specifying the form of prediction
function, and we further compared its performance in terms of covariate balancing with the
conventional model-based approach, such as logistic regression.
Method and Study Design: In this study, we tested 3 alternative methods for propensity
score (PS) estimation: main-effect logistic regression model (model 1: LOGREG),
comprehensive logistic regression model with all two-way interactions and polynomial terms
(model 2: LOGREG(INT)), and GBM (model 3). Implemented these algorithms for an
application based on prostate cancer from NCDB dataset, where we aimed to conduct an
effect comparison of overall survival between proton radiation therapy and conventional xray
based radiation therapy. Matching was performed to eliminate confounding effect via
PSM with caliper and different matching ratio up to 1:5. Balance was evaluated before and
after matching by standardized difference. The proportional hazard model was carried out to
estimate the hazard ratio of proton therapy with 95% confidence interval in the matched
sample.
Conclusion: The study reveals that covariate balancing can be improved by a more accurate
PS estimation model through GBM or comprehensive logistic regression, and both
approaches should be encouraged in the practice. In case study, we also found that proton
radiation therapy hold an improved clinical benefit for prostate cancer patients for long-term
survival.
Table of Contents
1. INTRODUCTION
1.1 Observational Study
1.2 Propensity Score
1.3 Variable Selection for the Propensity Score Model
1.4 Propensity Score Calculation
1.4.1 Main-effect Logistic Regression Model (LOGREG)
1.4.2 Comprehensive Logistic Regression Model with all Two-way Interactions and Polynomial Terms (LOGREG(INT))
1.4.3 Generalized Boosted Models (GBM)
1.5 Propensity Score Matching
1.5.1 Greedy Matching
1.5.2 1-1 to 1-N Caliper Matching
1.6 Treatment Effect
1.6.1 Average Treatment Effect (ATE):
1.6.2 Average Treatment Effect Among the Treated (ATT):
1.7 Checking balance on the covariates before and after matching
2. CASE STUDY
2.1 Study Objective
2.2 NCDB database
2.3 Define study population
2.4 Select the covariates
2.5 Statistical methods
3. RESULTS
3.1 Patients characteristics
3.2 Estimating propensity scores
3.3 PS Matching
3.4 Checking balance on the covariates before and after matching
3.4.1 Greedy Matching
3.4.2 1-1 to 1-N Caliper Matching
4. DISSUSSION
Bibliography
APPENDIX
About this Master's Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members | |
Partnering Agencies |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Assess Improvement of Balancing Covariates by Propensity Score approach using Generalized Boosted Model (GBM) and Application Based on National Cancer Database () | 2018-04-10 09:30:23 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Appendix Tables (Additional Tables for Thesis) | 2018-04-11 07:38:32 -0400 |
|