Training A Transformer LLM To Play Tic Tac Toe

Transformer neural nets have transformed natural language processing, but they can also be applied to other sequences not just sequences of words. In particular, they can be applied to game playing, which consists of a sequence of moves and where the task is to predict a good next move. In this project you will investigate the design and training of a transfomer to play tic tac toe (noughts and crosses). The simplicity of this game means that we will be able to use a small transformer model that will be quick to train.

There has already been some work on this: https://philliphaeusler.com/posts/tic_tac_toe/, which we will build upon. In particular, we will look at:

Directly training the transformer to play winning moves, rather than relying on a separate reinforcement learning step for this. This will require feature engineering and a revised cost function (e.g. see https://arxiv.org/abs/2106.01345).
Analysing the neural net activations to see whether it has self-learned an internal representation of the board state (see also https://github.com/DeanHazineh/Emergent-World-Representations-Othello).

For this project you will need a basic background in machine learning e.g. similar to what’s covered in the CSU44061 ML module. It will be a nice way to learn more about transformer neural networks and to gain experience in how they can be applied in interesting and surprising ways.