Linear Programming Approach for Q-Learning Open Access

Gong, Xindi (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/cc08hg869?locale=en
Published

Abstract

Reinforcement learning has made into the headlines for its success in games such as Alpha Go and in scientific applications such as protein folding and automation. These successful stories have inspired a better mathematical understanding of reinforcement learning, which has become a very important and active area of research.

In a reinforcement learning system, the agent's goal is to learn an optimal policy that minimizes the total running costs of actions over all successive steps. Given enough time and training trials, the agent will eventually learn the optimal policy. But within a complex system, tracking the action sequence and the feedback of each action from the environment may be difficult and time-consuming. In order to learn the optimal policy without observing numerous training trials, we consider the reinforcement learning technique Q-Learning and investigate a convex formulation of a training problem.

In this thesis, we apply Q-Learning to a deterministic and discrete-time reinforcement learning problem with discrete state and action space. To obtain an algorithm that solves the problem numerically, we formulate the learning problem as a linear program based on the Bellman equation. Although, linear programming approach solves our problem successfully, we believe that linear programming approach has limitations in systems with large state and action space and randomness. Our approach will become infeasible very quickly when the number of states and actions grows. Despite the limitations, linear programming approach provides insights in solving reinforcement learning problems, and further studies on testing limitations and improving the approach for complex systems will be included in the future.

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . 1

1.1 Background of Q-learning . . . . . . . . . . . . . . . . . . . . 3

1.2 Contributions and Outline . . . . . . . . . . . . . . . . . . . . 3

2 Background . . . . . . . . . . . . . . . . . . . 5

2.1 The Basic Reinforcement Learning Framework . . . . . . . . . 6

2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Value Function and State-value Function . . . . . . . . . . . . 12

2.4 Relevant Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Q-learning Algorithm . . . . . . . . . . . . . . . . . . . . 16

3.1 Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Q-Learning: an Introduction . . . . . . . . . . . . . . . . . . . 18

3.3 Q-Learning as a Linear Program . . . . . . . . . . . . . . . . . 20

4 Training Problem . . . . . . . . . . . . . . . . . . . . 22

4.1 Training Problem . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Numerical Solution . . . . . . . . . . . . . . . . . . . . 23

4.1.2 Linear Programming Approach . . . . . . . . . . . . . 26

4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Discussion . . . . . . . . . . . . . . . . . . . . 30

6 Conclusion . . . . . . . . . . . . . . . . . . . . 34

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files