Linear Programming Approach for Q-Learning Open Access
Gong, Xindi (Spring 2022)
Abstract
Reinforcement learning has made into the headlines for its success in games such as Alpha Go and in scientific applications such as protein folding and automation. These successful stories have inspired a better mathematical understanding of reinforcement learning, which has become a very important and active area of research.
In a reinforcement learning system, the agent's goal is to learn an optimal policy that minimizes the total running costs of actions over all successive steps. Given enough time and training trials, the agent will eventually learn the optimal policy. But within a complex system, tracking the action sequence and the feedback of each action from the environment may be difficult and time-consuming. In order to learn the optimal policy without observing numerous training trials, we consider the reinforcement learning technique Q-Learning and investigate a convex formulation of a training problem.
In this thesis, we apply Q-Learning to a deterministic and discrete-time reinforcement learning problem with discrete state and action space. To obtain an algorithm that solves the problem numerically, we formulate the learning problem as a linear program based on the Bellman equation. Although, linear programming approach solves our problem successfully, we believe that linear programming approach has limitations in systems with large state and action space and randomness. Our approach will become infeasible very quickly when the number of states and actions grows. Despite the limitations, linear programming approach provides insights in solving reinforcement learning problems, and further studies on testing limitations and improving the approach for complex systems will be included in the future.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . 1
1.1 Background of Q-learning . . . . . . . . . . . . . . . . . . . . 3
1.2 Contributions and Outline . . . . . . . . . . . . . . . . . . . . 3
2 Background . . . . . . . . . . . . . . . . . . . 5
2.1 The Basic Reinforcement Learning Framework . . . . . . . . . 6
2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Value Function and State-value Function . . . . . . . . . . . . 12
2.4 Relevant Work . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Q-learning Algorithm . . . . . . . . . . . . . . . . . . . . 16
3.1 Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Q-Learning: an Introduction . . . . . . . . . . . . . . . . . . . 18
3.3 Q-Learning as a Linear Program . . . . . . . . . . . . . . . . . 20
4 Training Problem . . . . . . . . . . . . . . . . . . . . 22
4.1 Training Problem . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 Numerical Solution . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Linear Programming Approach . . . . . . . . . . . . . 26
4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Discussion . . . . . . . . . . . . . . . . . . . . 30
6 Conclusion . . . . . . . . . . . . . . . . . . . . 34
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Linear Programming Approach for Q-Learning () | 2022-04-28 20:57:41 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|