Robust Uncertainty Quantification for Foundation Models: Bayesian and Frequentist Approaches for High-Stakes Applications Public

Zhao, Shifan (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/ws859h17t?locale=fr
Published

Abstract

Machine learning foundation models have demonstrated impressive predictive capabilities across various domains, including healthcare and climate science. However, their deterministic nature limits their utility in high-stakes applications where understanding prediction uncertainty is crucial for responsible decision-making. This thesis addresses this critical gap by developing two complementary approaches to uncertainty quantification (UQ) for foundation models. First, we introduce a novel two-stage Gaussian Process methodology that effectively handles mean and kernel misspecification—a common challenge in real-world applications. This approach separates mean prediction from uncertainty quantification, leading to more reliable uncertainty estimates even with limited data. We demonstrate its application to healthcare foundation models for patient risk prediction, where accurate uncertainty bounds can significantly impact clinical decision-making.

Second, we develop a Locally Debiased Adaptive Conformal Prediction (LC-ACP) framework that provides distribution-free coverage guarantees without requiring exchangeability assumptions, making it particularly valuable for non-stationary time series. We apply this methodology to climate foundation models for hurricane track prediction, where reliable uncertainty quantification directly impacts evacuation decisions and resource allocation during extreme weather events. To address computational challenges, we introduce kernel preconditioning techniques and unbiased estimators that significantly reduce the cubic complexity of Gaussian Processes while maintaining accuracy. Through comprehensive experiments across healthcare and climate domains, we demonstrate that our methods provide well-calibrated uncertainty estimates across diverse applications and data characteristics. This thesis contributes both theoretical advances and practical implementations that bridge the gap between powerful predictive models and responsible deployment in high-stakes, real-world applications.

Table of Contents

1 Background and Motivation

1.1 Overview and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Introduction to Deep Learning Modeling and Its Limitations . . . . . 1

1.3 Uncertainty Quantification: Foundations and Importance . . . . . . . 3

1.3.1 Formal Definition and Mathematical Framework . . . . . . . . 3

1.3.2 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.3 Known-Unknowns vs. Unknown-Unknowns . . . . . . . . . . . 6

1.3.4 Calibration and Sharpness Metrics . . . . . . . . . . . . . . . 6

1.3.5 Importance in High-Stakes Decision-Making . . . . . . . . . . 6

1.4 Uncertainty Quantification in Machine Learning . . . . . . . . . . . . 10

1.4.1 Traditional Statistical Approaches . . . . . . . . . . . . . . . . 10

1.4.2 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . 10

1.4.3 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.4 Evidential Deep Learning . . . . . . . . . . . . . . . . . . . . 11

1.5 Gaussian Processes for Uncertainty Quantification . . . . . . . . . . . 12

1.5.1 Mathematical Foundations . . . . . . . . . . . . . . . . . . . . 12

1.5.2 Mean Function and Kernel Description . . . . . . . . . . . . . 12

1.5.3 Posterior Predictive Distribution . . . . . . . . . . . . . . . . 14

1.5.4 Assumptions, Strengths, and Weaknesses . . . . . . . . . . . . 14

1.5.5 Sparse Gaussian Processes . . . . . . . . . . . . . . . . . . . . 15

1.6 Brief Introduction to Conformal Prediction . . . . . . . . . . . . . . . 18

1.7 Recent Advances in Uncertainty Quantification for Foundation Models 19

1.7.1 Uncertainty in Foundation Models . . . . . . . . . . . . . . . . 19

1.7.2 Recent Methodological Advances . . . . . . . . . . . . . . . . 19

1.7.3 Applications in High-Stakes Domains . . . . . . . . . . . . . . 20

1.7.4 Emerging Trends and Open Challenges . . . . . . . . . . . . . 20

1.8 Thesis Structure and Contributions Overview . . . . . . . . . . . . . 21

2 Two-Stage Gaussian Process Methodology

2.1 Introduction to Two-Stage Gaussian Processes . . . . . . . . . . . . . 23

2.2 Mitigating Mean Misspecification via Two-stage GPR . . . . . . . . . 25

2.2.1 Impact of Mean Misspecification . . . . . . . . . . . . . . . . . 25

2.2.2 Proposed Two-Stage GPR Framework . . . . . . . . . . . . . . 26

2.3 Addressing Kernel Misspecification . . . . . . . . . . . . . . . . . . . 26

2.3.1 Impact of Kernel Misspecification . . . . . . . . . . . . . . . . 27

2.3.2 Automatic Kernel Search Algorithm . . . . . . . . . . . . . . . 27

2.4 Efficient Training via Subsampling . . . . . . . . . . . . . . . . . . . 28

2.4.1 Gaussian Process Nearest Neighbor (GPNN) . . . . . . . . . . 28

2.4.2 Two-Stage GPNN (2StGPNN) . . . . . . . . . . . . . . . . . . 29

2.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.1 Performance Comparison on UCI Dataset . . . . . . . . . . . 30

2.5.2 Uncertainty-Aware Evaluation Metrics . . . . . . . . . . . . . 31

2.5.3 Synthetic Data Experiment: Mean Misspecification Effects . . 32

2.5.4 Hyperparameter Landscape Analysis . . . . . . . . . . . . . . 34

2.5.5 Application to Healthcare Risk Prediction . . . . . . . . . . . 34

2.6 Conclusion and Connections to Subsequent Chapters . . . . . . . . . 35

3 Application to Health Foundation Models

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Challenges in Health Data Prediction . . . . . . . . . . . . . . . . . . 37

3.2.1 Data Characteristics and Challenges . . . . . . . . . . . . . . 37

3.2.2 Existing Foundation Models in Healthcare . . . . . . . . . . . 38

3.2.3 The Critical Role of Uncertainty Quantification . . . . . . . . 38

3.3 Mathematical Formulation of GP-Enhanced Foundation Models . . . 39

3.3.1 Feature Extraction from Foundation Models . . . . . . . . . . 39

3.3.2 Two-Stage GP for Classification Tasks . . . . . . . . . . . . . 40

3.3.3 Kernel Selection and Hyperparameter Optimization . . . . . . 44

3.4 Implementation and Architecture . . . . . . . . . . . . . . . . . . . . 44

3.5 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.1 Datasets and Tasks . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.2 Foundation Models . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5.3 Baselines and Evaluation Metrics . . . . . . . . . . . . . . . . 46

3.6 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.6.1 Uncertainty Quantification Performance . . . . . . . . . . . . 47

3.6.2 Clinical Insights from Uncertainty Patterns . . . . . . . . . . . 48

3.7 Limitations and Future Directions . . . . . . . . . . . . . . . . . . . . 48

3.7.1 Current Limitations . . . . . . . . . . . . . . . . . . . . . . . . 48

3.7.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . 49

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Kernel Preconditioning and Unbiased Gaussian Processes for Computational Efficiency

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Background on Gaussian Process Inference . . . . . . . . . . . . . . . 52

4.2.1 Kernel Functions in Gaussian Process Regression . . . . . . . 53

4.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.1 Trace Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3.2 Iterative GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3.4 Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . 55

4.3.5 Stochastic Gradient Descent for Gaussian Process . . . . . . . 56

4.3.6 Barely Biased GP . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Adaptive Factorized Nystr.m Preconditioner . . . . . . . . . . . . . . 56

4.4.1 Background: Nystr.m Approximation . . . . . . . . . . . . . . 58

4.4.2 Factorized Sparse Approximate Inverse (FSAI) . . . . . . . . . 58

4.4.3 Construction of the AFN Preconditioner . . . . . . . . . . . . 58

4.4.4 Factorized Form of the AFN Preconditioner . . . . . . . . . . 59

4.5 Unbiased Log Marginal Likelihood Estimation and its Gradient . . . 60

4.5.1 SS-CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.5.2 Theoretical Analysis of SS Estimator . . . . . . . . . . . . . . 61

4.5.3 Preconditioned SS-CG . . . . . . . . . . . . . . . . . . . . . . 64

4.5.4 Small-Bias SS-CG . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 Theoretical Analysis of Kernel Preconditioning . . . . . . . . . . . . . 67

4.6.1 Interplay between Fill and Separation Distance . . . . . . . . 68

4.6.2 Nystr.m Approximation Error Analysis . . . . . . . . . . . . . 68

4.7 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.7.1 Derivative of Gaussian Kernel matrix . . . . . . . . . . . . . . 69

4.7.2 Derivative of Inverse . . . . . . . . . . . . . . . . . . . . . . . 69

4.7.3 Derivative of Cholesky Factorization . . . . . . . . . . . . . . 69

4.7.4 Derivative of Nystr.m . . . . . . . . . . . . . . . . . . . . . . 70

4.7.5 Derivative of FSAI . . . . . . . . . . . . . . . . . . . . . . . . 70

4.7.6 Derivative of Schur Complement . . . . . . . . . . . . . . . . . 71

4.7.7 Derivative of AFN . . . . . . . . . . . . . . . . . . . . . . . . 71

4.8 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.8.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 73

4.8.2 Synthetic 3D Dataset Experiments . . . . . . . . . . . . . . . 74

4.8.3 Results on Gaussian Kernel . . . . . . . . . . . . . . . . . . . 74

4.8.4 Results on Mat.rn-3/2 Kernel . . . . . . . . . . . . . . . . . . 75

4.8.5 Analysis of Landmark Point Selection . . . . . . . . . . . . . . 75

4.8.6 Unbiased Estimation Results . . . . . . . . . . . . . . . . . . . 76

4.8.7 Optimization Paths . . . . . . . . . . . . . . . . . . . . . . . . 76

4.8.8 Real-World Dataset Results . . . . . . . . . . . . . . . . . . . 77

4.8.9 Performance on Real-World Datasets . . . . . . . . . . . . . . 77

4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5 Locally Debiased Adaptive Conformal Prediction

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.2 Limitations of Gaussian Processes and Motivation for Conformal Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2.1 Coverage Guarantees Under Model Misspecification . . . . . . 83

5.2.2 The Challenge of Non-Stationarity . . . . . . . . . . . . . . . 84

5.2.3 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . 84

5.3 Theoretical Foundations of Conformal Prediction . . . . . . . . . . . 85

5.3.1 Basic Principles and Guarantees . . . . . . . . . . . . . . . . . 85

5.3.2 Split Conformal Prediction . . . . . . . . . . . . . . . . . . . . 86

5.3.3 Adaptive Conformal Prediction . . . . . . . . . . . . . . . . . 87

5.4 Locally Debiased Adaptive Conformal Prediction . . . . . . . . . . . 87

5.4.1 Motivation and Key Insights . . . . . . . . . . . . . . . . . . . 87

5.4.2 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . 88

5.4.3 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.4 Connections to Gaussian Processes . . . . . . . . . . . . . . . 90

5.5 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5.1 Coverage Guarantee Without Exchangeability . . . . . . . . . 91

5.5.2 Interval Width Reduction . . . . . . . . . . . . . . . . . . . . 92

5.5.3 Optimality Discussion . . . . . . . . . . . . . . . . . . . . . . 93

5.6 Experimental Results on Financial Volatility Prediction . . . . . . . . 94

5.6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 95

5.6.2 Coverage Analysis . . . . . . . . . . . . . . . . . . . . . . . . 97

5.6.3 Prediction Interval Width Analysis . . . . . . . . . . . . . . . 99

5.6.4 Coverage vs. Width Trade-off . . . . . . . . . . . . . . . . . . 100

5.6.5 Effect of Local Bias Correction . . . . . . . . . . . . . . . . . 101

5.6.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 102

5.6.7 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . 103

5.6.8 Summary of Experimental Findings . . . . . . . . . . . . . . . 103

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6 Application of LC-ACP to Climate and Weather Foundation Models

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2 Challenges in Hurricane Track Prediction and Uncertainty Quantification113

6.2.1 The Hurricane Prediction Problem . . . . . . . . . . . . . . . 113

6.2.2 Traditional Approaches to Hurricane Track Uncertainty . . . . 114

6.2.3 The Need for Adaptive, Distribution-Free Uncertainty Quantification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3 Methodology: Integrating TTMs with LC-ACP for Hurricane Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3.1 Overview of the Integrated Approach . . . . . . . . . . . . . . 115

6.3.2 Tiny Time Mixer (TTM) Architecture for Hurricane Forecasting 116

6.3.3 Locally Debiased Adaptive Conformal Prediction (LC-ACP) for

Hurricane Tracks . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.4 Experimental Setup and Data Sources . . . . . . . . . . . . . . . . . 119

6.4.1 Climate Model Synthetic Hurricane Tracks . . . . . . . . . . . 119

6.4.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . 119

6.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 120

6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 120

6.5.2 CanESM Galveston Tracks Experiment . . . . . . . . . . . . . 121

6.5.3 Quantitative Results and Method Comparison . . . . . . . . . 121

6.6 Operational Considerations and Limitations . . . . . . . . . . . . . . 122

6.6.1 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . 122

6.6.2 Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . 122

6.6.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . 123

6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7 Conclusions and Future Work

7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 128

7.1.1 Methodological Innovations . . . . . . . . . . . . . . . . . . . 128

7.1.2 Applications to Healthcare Foundation Models . . . . . . . . . 129

7.1.3 Applications to Financial Time Series Forecasting . . . . . . . 129

7.1.4 Applications to Climate and Hurricane Forecasting . . . . . . 130

7.2 Synthesis of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7.2.1 From GPs to LC-ACP: A Progression in Uncertainty Quantification. . . . . . . . . . . . . . . . . . . . . . 130

7.2.2 Application-Driven Methodology Development . . . . . . . . . 131

7.2.3 Importance of Adaptive and Local Approaches . . . . . . . . . 131

7.3 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . 132

7.3.1 Theoretical Extensions . . . . . . . . . . . . . . . . . . . . . . 132

7.3.2 Methodological Advancements . . . . . . . . . . . . . . . . . . 132

7.3.3 Application Expansion . . . . . . . . . . . . . . . . . . . . . . 133

7.3.4 Implementation and Deployment . . . . . . . . . . . . . . . . 133

7.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

A Appendix

A.1 Proofs for Two-Stage Gaussian Process Theorems . . . . . . . . . . . 135

A.1.1 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . 135

A.1.2 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . 136

Bibliography 137

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files