Linear Programming Approach for Q-Learning Open Access

Gong, Xindi (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/cc08hg869?locale=en

Published

Abstract

Reinforcement learning has made into the headlines for its success in games such as Alpha Go and in scientific applications such as protein folding and automation. These successful stories have inspired a better mathematical understanding of reinforcement learning, which has become a very important and active area of research.

In a reinforcement learning system, the agent's goal is to learn an optimal policy that minimizes the total running costs of actions over all successive steps. Given enough time and training trials, the agent will eventually learn the optimal policy. But within a complex system, tracking the action sequence and the feedback of each action from the environment may be difficult and time-consuming. In order to learn the optimal policy without observing numerous training trials, we consider the reinforcement learning technique Q-Learning and investigate a convex formulation of a training problem.

In this thesis, we apply Q-Learning to a deterministic and discrete-time reinforcement learning problem with discrete state and action space. To obtain an algorithm that solves the problem numerically, we formulate the learning problem as a linear program based on the Bellman equation. Although, linear programming approach solves our problem successfully, we believe that linear programming approach has limitations in systems with large state and action space and randomness. Our approach will become infeasible very quickly when the number of states and actions grows. Despite the limitations, linear programming approach provides insights in solving reinforcement learning problems, and further studies on testing limitations and improving the approach for complex systems will be included in the future.

1 Introduction . . . . . . . . . . . . . . . . . . . 1

1.1 Background of Q-learning . . . . . . . . . . . . . . . . . . . . 3

1.2 Contributions and Outline . . . . . . . . . . . . . . . . . . . . 3

2 Background . . . . . . . . . . . . . . . . . . . 5

2.1 The Basic Reinforcement Learning Framework . . . . . . . . . 6

2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Value Function and State-value Function . . . . . . . . . . . . 12

2.4 Relevant Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Q-learning Algorithm . . . . . . . . . . . . . . . . . . . . 16

3.1 Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Q-Learning: an Introduction . . . . . . . . . . . . . . . . . . . 18

3.3 Q-Learning as a Linear Program . . . . . . . . . . . . . . . . . 20

4 Training Problem . . . . . . . . . . . . . . . . . . . . 22

4.1 Training Problem . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Numerical Solution . . . . . . . . . . . . . . . . . . . . 23

4.1.2 Linear Programming Approach . . . . . . . . . . . . . 26

4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Discussion . . . . . . . . . . . . . . . . . . . . 30

6 Conclusion . . . . . . . . . . . . . . . . . . . . 34

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Mathematics
Degree	B.S.
Submission	Honors Thesis
Language	English
Research Field	Mathematics Applied Mathematics
Keyword	Linear Programming Bellman Equation Reinforcement Learning Q-Learning
Committee Chair / Thesis Advisor	Lars Ruthotto, Emory University
Committee Members	Jeremy Jacobson, Emory University Zhiyun Gong, Emory University

Last modified

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Linear Programming Approach for Q-Learning ()	2022-04-28 20:57:41 -0400	Download

Linear Programming Approach for Q-Learning Open Access

Gong, Xindi (Spring 2022)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files