Unraveling the Impact of Fuzzy Similarity Algorithms on Missing Data Imputation of Heart Bypass Surgery Cohort Open Access
Shen, Hong-Jui (Spring 2024)
Abstract
Objective: This thesis introduces the Fuzzy C-Means based Random Forest (FCRF) method, developed to address the limitations of existing data imputation techniques in public health datasets. Aimed at enhancing imputation accuracy, FCRF integrates fuzzy logic and similarity learning to navigate complex missing data mechanisms: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR).
Method: The performance of FCRF is evaluated against traditional imputation methods—Mean, K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), and Iterative Imputation—using metrics like Average RMSE, Normalized RMSE, Mean Absolute Error (MAE), Weighted F1-Score, and Normalized Accuracy. This comparative analysis spans various missing data scenarios to assess each method's effectiveness comprehensively.
Results: Results show that FCRF exhibits competitive performance across all scenarios, particularly excelling in complex MNAR situations where conventional methods falter. Its methodological design, which combines clustering and predictive modeling, offers nuanced capabilities beneficial for public health research.
Conclusion: FCRF marks a significant advancement in data imputation, promising more accurate and reliable analyses for public health research. Future work will explore FCRF's impact on standard error and variance estimates to ensure the method's robustness, aiming to prevent potential biases in statistical inferences. This research contributes to enhancing data integrity, and supporting informed decision-making in public health.
Table of Contents
Chapter 1. Introduction and literature review
1.1 Relevance of Data Imputation In Public Health
1.2 Literature Review
1.3 Ethical Considerations
Chapter 2. Methodology
2.1 Software and Package Utilization
2.2 Data Collection and Processing
2.3 Variable Selection
2.4 Data Preprocessing
2.41 Missing Completely At Random (MCAR)
2.42 Missing At Random (MAR)
2.43 Missing Not At Random (MNAR)
2.5 Theoretical Foundation and Algorithmic Framework
2.6 Imputation Process
2.7 Evaluation Metrics
2.71 Continuous Variables Evaluation
2.72 Categorical and Binary Variables Evaluation
Chapter 3. Results
3.1 MCAR Data Scenario
3.2 MAR Data Scenario
3.3 MNAR Data Scenario
3.4 Discussion
Chapter 4. Conclusion and Future work
Appendix
References
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Unraveling the Impact of Fuzzy Similarity Algorithms on Missing Data Imputation of Heart Bypass Surgery Cohort () | 2024-04-08 11:45:24 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|