Multiresolution Methods for Convolutional Neural Networks Open Access

Scope Crafts, Evan (Spring 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/rj430572z?locale=en
Published

Abstract

Convolutional neural networks (CNNs) are widely used for speech, image, and video recognition due to their state of the art performance. However, little theory exists for designing CNNs and CNNs typically depend explicitly on the resolution of the input data. The CNN is a nested function comprised of a series of non-linear transformations parameterized by initially randomized convolution operators that are optimized to interpolate and extrapolate from data. A new interpretation of the CNN relates the convolution operators acting on image data to a linear combination of differential operators which yields a continuous understanding of CNNs. Multigrid methods are used to efficiently solve partial differential equations (PDEs), which are equations that relate multiple variables and their partial derivatives, using a family of fine and coarse grids. The continuous understanding of CNNs provides a way to implement multigrid methods on the convolution operators of a CNN. This can be used to efficiently handle image data of different resolutions and to train on computationally cheaper lower resolutions. The effectiveness of multigrid methods on a residual neural network architecture (ResNet), a neural network with added stability in the component functions, has been demonstrated previously. This thesis analyzes the effectiveness of multigrid methods on variations of a classical CNN. The experiments here show that on a classical CNN multigrid methods can suffer from overfitting without careful implementation due to the difficult nature of the optimization problem. 

Table of Contents

1 Introduction 1

1.1 Contributions and Outline . . . . . . . . . . . . . . . . . . . . 2

2 Background 3

2.1 Neural Networks: An Introduction . . . . . . . . . . . . . . . 3

2.2 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 A New Interpretation: The Kernel as a Differential Operator . 7

2.4 Introducing Multigrid Methods . . . . . . . . . . . . . . . . . 10

3 Proposed Method 12

3.1 Multigrid Implementation . . . . . . . . . . . . . . . . . . . . 12

3.1.1 Defining the Fine and Coarse Grids . . . . . . . . . . . 13

3.1.2 Interpolation Of Network Weights . . . . . . . . . . . . 14

3.1.3 Interpolation Of Classifier Weights . . . . . . . . . . . 15

3.1.4 Choice of Prolongation and Restriction Matrices . . . . 16

3.2 Network Frameworks and Optimization Methods . . . . . . . . 16

3.2.1 Regularization and Optimization . . . . . . . . . . . . 17

4 Numerical Experiments 20

4.1 Experiments Run . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Baseline Results . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . 27

5 Summary and Conclusion 30

A Main Notation 33

B Abbreviations 34

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files