Large-Scale Parameter Estimation in Geophysics and Machine Learning Open Access

Wu Fung, Samy (Spring 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/pz50gx314?locale=en
Published

Abstract

The ability to collect large amounts of data with relative ease has given rise to new opportunities for scientific discovery. It has led to a new class of large-scale parameter estimation problems in geophysics, machine learning, and numerous other applications. Traditionally, parameter estimation aims to infer parameters in a physical model from indirect measurements, where the model is often given by a partial differential equation (PDE). Here, we also phrase machine learning as a parameter estimation problem, where rather than having a PDE as the model, we have a hypothesis function. For instance, the hypothesis function may be a neural network with the parameters of interest corresponding to the weights of the network. A common thread in these problems is their massive computational expense. In both cases, the underlying parameter space is typically very high-dimensional, making the optimization computationally demanding, sometimes intractable, when large amounts of data are available. This thesis addresses two general approaches to reduce the computational burdens of large-scale parameter estimation in geophysics and machine learning: 1) model order reduction (MOR), which aims to reduce the computational complexity of the model, and 2) parallel/distributed optimization which aims to reduce the time-to-solution in parameter estimation.

For the MOR approach, we present an adaptive scheme tailored to problems in geophysics, where the number of PDE simulations required to accurately reconstruct the parameter is correlated to the amount of measurements. To this end, we apply the multiscale finite volumes (MSFV) to solve high-dimensional geophysics parameter estimation problems. Given a finite volume discretization of the PDE on a fine mesh, the MSFV method reduces the problem size by computing a parameter-dependent projection onto a nested coarse mesh. A novelty in our work is the integration of MSFV into a PDE-constrained optimization framework, which updates the reduced space in each iteration. This adaptivity of the projection basis allows us to project to an aggressively coarsened mesh while achieving highly accurate solutions. We also present a computationally tractable way of explicitly differentiating the MOR solution that acknowledges the change of basis. We illustrate the effectiveness of this approach on the direct current resistivity survey.

For the parallel/distributed approach, we propose two methods. In the first method, we present an asynchronous, uncertainty-weighted alternating direction method of multipliers (ADMM). In particular, we consider a global variable consensus ADMM algorithm to estimate parameters in geophysics and machine learning asynchronously and in parallel. Motivated by problems with many measurements, we partition the data and distribute the resulting subproblems among the available workers. Since each subproblem can be associated with different models and right-hand-sides, this provides ample options for tailoring the method to different applications. Our contribution is a novel weighting scheme that empirically improves the progress made in the early iterations of the consensus ADMM scheme and is attractive when using a large number of subproblems. The weights in our scheme are related to the uncertainties associated with the solutions of each subproblem and can be computed efficiently using iterative schemes. We exemplarily show that the weighting scheme leads to accelerated convergence for a series of linear and nonlinear parameter estimation problems. We also show that the asynchronous implementation further reduces the time-to-solution of 3D problems in geophysics. In the second method, we present an ADMM-based technique (ADMM-Softmax) which aims to efficiently learn the weights in multinomial logistic regression (MLR) problems. In each iteration, our algorithm decomposes the training into three steps; a linear least-squares problem for the weights, a global variable update involving a separable MLR problem, and a trivial dual variable update. The least-squares problem can be factorized in the off-line phase, and the separability in the global variable update allows for efficient parallelization, leading to faster convergence. We outline the potential of our method for the MNIST and CIFAR-10 datasets, and show that ADMM-Softmax leads to improved generalization and convergence compared to the current state-of-the art methods. 

Table of Contents

1 Introduction 

1.1 Contribution and RelatedWorks

1.1.1 Model Order Reduction

1.1.2 Parallel and Distributed Optimization

1.2 Thesis Overview

2 Preliminaries

2.1 MAP Estimation and UQ

2.1.1 MAP Estimation in Classification

2.2 Numerical Optimization 

2.2.1 Gauss-Newton-PCG

2.2.2 Alternating Direction Method of Multipliers

2.3 Applications in Geophysics and Machine Learning

3 Adaptive Multiscale Model Reduction

3.1 DCR Forward and Inverse Problem

3.1.1 Sensitivity Computation

3.2 Model Order Reduction

3.3 Multiscale Finite Elements/Volumes

3.4 Optimization with MSFV Methods

3.4.1 Reduced Optimization 

3.4.2 Optimization with Fixed Reduced Space 

3.4.3 Optimization with Adaptive Reduced Space 

3.4.4 Illustrating the Error

3.4.5 Local Sensitivity Computation

3.5 NumericalResults

3.5.1 BlockModelTestProblem

3.5.2 SEG/EAGETestProblem

3.5.3 Parallel Efficiency

3.6 Discussion

4 Uncertainty-Weighted Asynchronous Optimization 

4.1 Background

4.2 Uncertainty-Weighted Consensus ADMM

4.2.1 Weighted Consensus ADMM

4.2.2 Computing the Weights

4.3 Numerical Results

4.3.1 Least-Squares

4.3.2 Multinomial Logistic Regression

4.3.3 Single-physics Parameter Estimation

4.3.4 Multi-physics Parameter Estimation

4.3.5 Communication Costs

4.4 Discussion

5 Classification with Multinomial Logistic Regression 

5.1 RelatedWork

5.2 Mathematical Formulation

5.3 ADMM-Softmax .

5.3.1 Solving the Least-Squares 

5.3.2 Global Variable Update

5.3.3 Computational Costs and Convergence

5.4 Numerical Experiments

5.4.1 Setup

5.4.2 Results

5.5 Discussion

6 Summary and Outlook

Bibliography 

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files