Reinforcement Learning with simple game

There is a traditional ‘optimal’ way to play the game of ‘tic tac toe’ — the way defined by applying the so-called ‘minimax’ algorithm (Chap 5 Russell & Norvig ‘Artif. Intelligence: a Modern Approach’ [R&N]) . An area in of application of ‘Markov Decision Processes’ and ‘Reinforcement Learning’ (Chap 22 [R&N], Chap 13 Mackworth & Pool ‘Artif. Intelligence’ [M&P]) is in learning to play games. This project would explore this in the context of ‘tic tac toe’. Initially the aim would be to explore the success of various RL algorithms to train an agent against a MiniMax player. Secondly one might seek to ‘handicap’ the MiniMax player to play sub-optimally and see if an RL-training agent can learn to beat the ‘handicapped’ version. Thirdly one might seek to train a successful agent using RL in ‘self play’