Optimal Control Approaches for Designing Neural Ordinary Differential Equations 公开

Onken, Derek (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/kh04dq890?locale=zh

Published

Abstract

Neural network design encompasses both model formulation and numerical treatment for optimization and parameter tuning. Recent research in formulation focuses on interpreting architectures as discretizations of ordinary differential equations (ODEs). These neural ODEs, in which the ODE dynamics are defined by neural network components, benefit from reduced parameterization and smoother hidden states than traditional discrete neural networks but come at high computational costs. Training a neural ODE can be phrased as an ODE-constrained optimization problem, which allows for the application of mathematical optimal control (OC). The application of OC theory leads to design choices that differ from popular high-cost implementations. We improve neural ODE numerical treatment and formulation for models used in time-series regression, image classification, continuous normalizing flows, and path-finding problems.

1 Introduction 1

1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Efficient Numerical Treatment . . . . . . . . . . . . . . . . . . 2

1.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Mathematical Background 6

2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Neural ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Pontryagin Maximum Principle . . . . . . . . . . . . . . . . . 14

2.3.2 Hamilton-Jacobi-Bellman PDE . . . . . . . . . . . . . . . . . 14

2.4 Learning and Optimal Control . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Neural ODEs as Reinforcement Learning . . . . . . . . . . . . . . . . 17

3 Time-Series Regression 19

3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Discretize-Optimize vs. Optimize-Discretize . . . . . . . . . . . . . . 20

3.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Extrapolation and Different Initial Conditions . . . . . . . . . 27

4 Image Classification 29

4.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Focal Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.3 Double Symmetric Layer . . . . . . . . . . . . . . . . . . . . . 35

4.3 Decoupling the Weights and Layers . . . . . . . . . . . . . . . . . . . 35

4.4 Image Classification for Lung Cancer Detection . . . . . . . . . . . . 37

4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.3 National Lung Screening Trial Experiment . . . . . . . . . . . 39

5 Continuous Normalizing Flows for Density Estimation 42

5.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.1 Finite Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.2 Infinitesimal Flows . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.3 Flows Influenced by Optimal Control . . . . . . . . . . . . . . 49

5.3 Discretize-Optimize Flows . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.1 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 51

5.4 OT-Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.4.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 71

6 Path-Finding 77

6.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2.1 High-Dimensional Deterministic Optimal Control . . . . . . . 79

6.2.2 High-Dimensional Stochastic Optimal Control . . . . . . . . . 80

6.2.3 Multi-Agent Path-Finding . . . . . . . . . . . . . . . . . . . . 81

6.3 Neural ODE Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.1 Main Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.2 Adding Hamilton-Jacobi-Bellman Penalizers . . . . . . . . . . 83

6.3.3 Robustness to Shocks . . . . . . . . . . . . . . . . . . . . . . . 84

6.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.4.1 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . 86

6.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5.1 Baseline: Discrete Optimization for a Single Initial State . . . 89

6.5.2 Two-Agent Corridor Experiment . . . . . . . . . . . . . . . . 90

6.5.3 Effect of the Hamilton-Jacobi-Bellman Penalizers . . . . . . . 92

6.5.4 Multi-Agent Swap Experiments . . . . . . . . . . . . . . . . . 98

6.5.5 Swarm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.5.6 Quadcopter Experiment . . . . . . . . . . . . . . . . . . . . . 102

7 Summary 107

Appendix A Derivation of Adjoint Equations 110

A.1 Continuous Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A.2 Discrete Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Appendix B Bootstrapping 114

Bibliography 115

Mathematical Symbols 130

Index 134

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Computer Science and Informatics
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Applied Mathematics Computer Science
关键词	neural ODEs optimal control neural networks machine learning
Committee Chair / Thesis Advisor	James G. Nagy, Emory University Lars Ruthotto, Emory University Yuanzhe Xi, Emory University
Committee Members	Rachel Jennings, Applied Research Associates

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Optimal Control Approaches for Designing Neural Ordinary Differential Equations ()	2021-03-30 22:37:09 -0400	Download

Optimal Control Approaches for Designing Neural Ordinary Differential Equations 公开

Onken, Derek (Spring 2021)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files