The aim of this project is to understand the interplay between randomness and optimization in the choice of a Markov decision process and of a policy, working with simple examples such as tic tac toe and variants. Particular attention will be paid to entropy and its maximization.