Date of Award

Fall 2021

Document Type


Degree Name

Master of Science (MS)


Computer Science

Committee Chairperson

Richard Burns, Ph.D.

Committee Member

Si Chen, Ph.D.

Committee Member

Linh Ngo, Ph.D.


This thesis involves the use of a reinforcement learning algorithm (RL) called Q-learning to train a Q-agent to play a game of Pong against a near-perfect opponent. Compared to previously related work which trained Pong RL agents by combining Q-learning with deep learning in an algorithm known as Deep Q-Networks, the work presented in this paper takes advantage of known environment constraints of the custom-made Pong environment to train the agent using one-step Q-learning alone. In addition, the thesis explores ways of making the Q-learning more efficient by converting Markov Decision Processes (MDPs) to Partially Observable Markov Decision Processes (POMDPs), and by using state reduction techniques such as state discretization and state distillation. Based on experiments conducted, this thesis highlights that it is possible to use one-step Q-learning, a model-free algorithm typically relegated to solving simple maze world environments, in combination with a POMDP and state distillation to train a Q-agent to play Pong and converge to the optimal policy.