Optimal Control Approaches for Designing Neural Ordinary Differential Equations 公开

Onken, Derek (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/kh04dq890?locale=zh
Published

Abstract

  Neural network design encompasses both model formulation and numerical treatment for optimization and parameter tuning. Recent research in formulation focuses on interpreting architectures as discretizations of ordinary differential equations (ODEs). These neural ODEs, in which the ODE dynamics are defined by neural network components, benefit from reduced parameterization and smoother hidden states than traditional discrete neural networks but come at high computational costs. Training a neural ODE can be phrased as an ODE-constrained optimization problem, which allows for the application of mathematical optimal control (OC). The application of OC theory leads to design choices that differ from popular high-cost implementations. We improve neural ODE numerical treatment and formulation for models used in time-series regression, image classification, continuous normalizing flows, and path-finding problems.

Table of Contents

1 Introduction 1

1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Efficient Numerical Treatment . . . . . . . . . . . . . . . . . . 2

1.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Mathematical Background 6

2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Neural ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Pontryagin Maximum Principle . . . . . . . . . . . . . . . . . 14

2.3.2 Hamilton-Jacobi-Bellman PDE . . . . . . . . . . . . . . . . . 14

2.4 Learning and Optimal Control . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Neural ODEs as Reinforcement Learning . . . . . . . . . . . . . . . . 17

3 Time-Series Regression 19

3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Discretize-Optimize vs. Optimize-Discretize . . . . . . . . . . . . . . 20

3.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Extrapolation and Different Initial Conditions . . . . . . . . . 27

4 Image Classification 29

4.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Focal Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.3 Double Symmetric Layer . . . . . . . . . . . . . . . . . . . . . 35

4.3 Decoupling the Weights and Layers . . . . . . . . . . . . . . . . . . . 35

4.4 Image Classification for Lung Cancer Detection . . . . . . . . . . . . 37

4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.3 National Lung Screening Trial Experiment . . . . . . . . . . . 39

5 Continuous Normalizing Flows for Density Estimation 42

5.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.1 Finite Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.2 Infinitesimal Flows . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.3 Flows Influenced by Optimal Control . . . . . . . . . . . . . . 49

5.3 Discretize-Optimize Flows . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.1 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 51

5.4 OT-Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 71

6 Path-Finding 77

6.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2.1 High-Dimensional Deterministic Optimal Control . . . . . . . 79

6.2.2 High-Dimensional Stochastic Optimal Control . . . . . . . . . 80

6.2.3 Multi-Agent Path-Finding . . . . . . . . . . . . . . . . . . . . 81

6.3 Neural ODE Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.1 Main Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.2 Adding Hamilton-Jacobi-Bellman Penalizers . . . . . . . . . . 83

6.3.3 Robustness to Shocks . . . . . . . . . . . . . . . . . . . . . . . 84

6.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.4.1 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . 86

6.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5.1 Baseline: Discrete Optimization for a Single Initial State . . . 89

6.5.2 Two-Agent Corridor Experiment . . . . . . . . . . . . . . . . 90

6.5.3 Effect of the Hamilton-Jacobi-Bellman Penalizers . . . . . . . 92

6.5.4 Multi-Agent Swap Experiments . . . . . . . . . . . . . . . . . 98

6.5.5 Swarm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.5.6 Quadcopter Experiment . . . . . . . . . . . . . . . . . . . . . 102

7 Summary 107

Appendix A Derivation of Adjoint Equations 110

A.1 Continuous Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A.2 Discrete Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Appendix B Bootstrapping 114

Bibliography 115

Mathematical Symbols 130

Index 134

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files