Optimal Control Approaches for Designing Neural Ordinary Differential Equations Open Access
Onken, Derek (Spring 2021)
Abstract
Neural network design encompasses both model formulation and numerical treatment for optimization and parameter tuning. Recent research in formulation focuses on interpreting architectures as discretizations of ordinary differential equations (ODEs). These neural ODEs, in which the ODE dynamics are defined by neural network components, benefit from reduced parameterization and smoother hidden states than traditional discrete neural networks but come at high computational costs. Training a neural ODE can be phrased as an ODE-constrained optimization problem, which allows for the application of mathematical optimal control (OC). The application of OC theory leads to design choices that differ from popular high-cost implementations. We improve neural ODE numerical treatment and formulation for models used in time-series regression, image classification, continuous normalizing flows, and path-finding problems.
Table of Contents
1 Introduction 1
1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Efficient Numerical Treatment . . . . . . . . . . . . . . . . . . 2
1.1.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Mathematical Background 6
2.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Neural ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Pontryagin Maximum Principle . . . . . . . . . . . . . . . . . 14
2.3.2 Hamilton-Jacobi-Bellman PDE . . . . . . . . . . . . . . . . . 14
2.4 Learning and Optimal Control . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Neural ODEs as Reinforcement Learning . . . . . . . . . . . . . . . . 17
3 Time-Series Regression 19
3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Discretize-Optimize vs. Optimize-Discretize . . . . . . . . . . . . . . 20
3.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Extrapolation and Different Initial Conditions . . . . . . . . . 27
4 Image Classification 29
4.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Focal Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 Double Symmetric Layer . . . . . . . . . . . . . . . . . . . . . 35
4.3 Decoupling the Weights and Layers . . . . . . . . . . . . . . . . . . . 35
4.4 Image Classification for Lung Cancer Detection . . . . . . . . . . . . 37
4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.3 National Lung Screening Trial Experiment . . . . . . . . . . . 39
5 Continuous Normalizing Flows for Density Estimation 42
5.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Finite Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Infinitesimal Flows . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.3 Flows Influenced by Optimal Control . . . . . . . . . . . . . . 49
5.3 Discretize-Optimize Flows . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.1 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 51
5.4 OT-Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 71
6 Path-Finding 77
6.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.1 High-Dimensional Deterministic Optimal Control . . . . . . . 79
6.2.2 High-Dimensional Stochastic Optimal Control . . . . . . . . . 80
6.2.3 Multi-Agent Path-Finding . . . . . . . . . . . . . . . . . . . . 81
6.3 Neural ODE Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.1 Main Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.2 Adding Hamilton-Jacobi-Bellman Penalizers . . . . . . . . . . 83
6.3.3 Robustness to Shocks . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4.1 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . 86
6.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5.1 Baseline: Discrete Optimization for a Single Initial State . . . 89
6.5.2 Two-Agent Corridor Experiment . . . . . . . . . . . . . . . . 90
6.5.3 Effect of the Hamilton-Jacobi-Bellman Penalizers . . . . . . . 92
6.5.4 Multi-Agent Swap Experiments . . . . . . . . . . . . . . . . . 98
6.5.5 Swarm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.5.6 Quadcopter Experiment . . . . . . . . . . . . . . . . . . . . . 102
7 Summary 107
Appendix A Derivation of Adjoint Equations 110
A.1 Continuous Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.2 Discrete Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Appendix B Bootstrapping 114
Bibliography 115
Mathematical Symbols 130
Index 134
About this Dissertation
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language | 
 | 
| Research Field | |
| Keyword | |
| Committee Chair / Thesis Advisor | |
| Committee Members | 
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions | 
|---|---|---|---|
|  | Optimal Control Approaches for Designing Neural Ordinary Differential Equations () | 2021-03-30 22:37:09 -0400 |  | 
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions | 
|---|