Statistical approaches for understanding and addressing preferential sampling in model-based geostatistics Restricted; Files Only

Hsiao, Thomas (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/qv33rz15c?locale=pt-BR
Published

Abstract

Traditional geostatistical and point process methods assume observation locations are independent of the latent spatial process of interest. Violations of this assumption, known as preferential sampling (PS), can introduce bias in spatial inference and prediction. Current practice is to adjust for PS through a shared variable approach or shared latent process (SLP) model, which has been shown to improve prediction and inference. However, our understanding of how traditional methods perform under PS remains limited, alternatives to the SLP are sparse, and point process modeling approaches have yet to be implemented in several applied settings, including species distribution modeling of monarch butterflies.

In the first aim, we examined the large sample behavior of the maximum likelihood estimator (MLE) for the model-based geostatistics framework under PS, assuming a stationary Gaussian process with Matérn covariance. Surprisingly, we found that under general conditions, the fixed-domain asymptotic behavior of the MLE is unaffected by the sampling mechanism. Moreover, as sample size increases, the MLE corrects for PS-induced bias more effectively than common alternative methods like composite likelihood and the Vecchia approximation and attains performance similar to the SLP for PS adjustment.

In the second aim, we introduce inverse sampling intensity weighting (ISIW) as a novel alternative to the SLP. In ISIW, we first estimate the sampling intensity at each observation location and then incorporate these estimates as weights in a weighted likelihood adjustment. Our approach preserves kriging’s linear predictor structure and leverages the Vecchia approximation for scalability. While ISIW performs poorly for inference, it dramatically improves prediction under PS, is computationally faster, and remains robust across different PS mechanisms, though estimating the weights in practice remains a key challenge.

Finally, we extend PS concepts to species distribution modeling, addressing varying sampling effort (VSE) in Journey North monarch butterfly data (2011–2020). Using presence-only citizen science data and distance sampling from the nearest road, we apply a thinned point process model approach with integrated nested Laplace approximation (INLA) to estimate the spatial distribution of monarchs in the western United States, improving model fit and revealing strong evidence of VSE in citizen science data. Our approach helps to quantify preferential sampling in passive wildlife surveillance.

Table of Contents

1. Background and PreliminariesPage 1

1.1 Stochastic processes – 1

1.2 Gaussian random fields – 2

1.3 Point processes – 4

1.4 Model-based geostatistics – 7

    1.4.1 Estimation and inference – 8

    1.4.2 Prediction – 10

1.5 Preferential sampling – 10

    1.5.1 Failure of standard methods under PS – 10

    1.5.2 Marked point processes – 12

    1.5.3 The shared latent process model – 12

    1.5.4 The Bayesian shared latent process model – 15

    1.5.5 Estimation by TMB – 16

    1.5.6 Estimation by INLA (and a new PS framework) – 17

    1.5.7 Weighted composite likelihood for PS – 18

    1.5.8 Summary – 20

2. The MLE under Fixed Domain AsymptoticsPage 26

2.1 Asymptotic frameworks in spatial statistics – 26

2.2 Equivalence of measures and microergodicity – 28

2.3 Asymptotics for the Matérn covariance – 30

    2.3.1 Extensions to nonzero nugget – 32

    2.3.2 Extensions to fixed effects – 33

2.4 Summary – 33

3. When Does Geostatistical Design Matter? Insights into the Effect of Preferential Sampling on the MLEPage 35

3.1 Introduction – 36

3.2 Theoretical Results – 38

    3.2.1 Background – 38

    3.2.2 Parameter Estimation – 40

    3.2.3 Prediction – 42

3.3 Simulation Experiment – 43

3.4 Results – 45

3.5 Discussion – 48

3.6 Conclusion – 52

3.7 Supplementary Material – 53

4. Preferential Sampling Adjustment Using Inverse Sampling Intensity Weights (ISIW)Page 58

4.1 Introduction – 59

4.2 Model-based geostatistics – 65

    4.2.1 Estimation – 65

    4.2.2 Prediction – 71

4.3 Inverse sampling intensity weighting – 72

    4.3.1 Estimation of sampling intensity – 72

    4.3.2 Defining the likelihood – 74

    4.3.3 Numerical estimation – 75

    4.3.4 Winsorization of extreme weights – 75

    4.3.5 Prediction – 76

4.4 Simulation analysis – 76

    4.4.1 Experiment – 76

    4.4.2 Results – 78

4.5 Application to the Galicia moss data – 82

4.6 Discussion – 85

4.7 Supplementary Material – 88

5. Accounting for Spatially Varying Sampling Effort: A Case Study of Monarch Butterflies in North AmericaPage 92

5.1 Introduction – 92

5.2 Data – 95

    5.2.1 Adult monarch sightings – 95

    5.2.2 Covariates – 96

5.3 Methods – 99

    5.3.1 Statistical model and priors – 99

    5.3.2 Estimation and computation – 101

    5.3.3 Evaluation – 102

5.4 Results – 103

5.5 Discussion – 105

5.6 Figures and Tables – 108

6. Future WorkPage 117

6.1 Extension of the SLP to more flexible point processes – 117

6.2 Simultaneous weight estimation in ISIW – 118

6.3 Incorporation of positional error to preferential sampling models – 119

BibliographyPage 121

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palavra-chave
Committee Chair / Thesis Advisor
Committee Members
Última modificação Preview image embargoed

Primary PDF

Supplemental Files