Date of Award
Fall 2021
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chairperson
Richard Burns, Ph.D.
Committee Member
Si Chen, Ph.D.
Committee Member
Linh Ngo, Ph.D.
Abstract
This thesis involves the use of a reinforcement learning algorithm (RL) called Q-learning to train a Q-agent to play a game of Pong against a near-perfect opponent. Compared to previously related work which trained Pong RL agents by combining Q-learning with deep learning in an algorithm known as Deep Q-Networks, the work presented in this paper takes advantage of known environment constraints of the custom-made Pong environment to train the agent using one-step Q-learning alone. In addition, the thesis explores ways of making the Q-learning more efficient by converting Markov Decision Processes (MDPs) to Partially Observable Markov Decision Processes (POMDPs), and by using state reduction techniques such as state discretization and state distillation. Based on experiments conducted, this thesis highlights that it is possible to use one-step Q-learning, a model-free algorithm typically relegated to solving simple maze world environments, in combination with a POMDP and state distillation to train a Q-agent to play Pong and converge to the optimal policy.
Recommended Citation
Kumar, Akash, "Playing Pong Using Q-Learning" (2021). West Chester University Master’s Theses. 226.
https://digitalcommons.wcupa.edu/all_theses/226