This project aims to work on techniques for fine tuning LLMs to act as a source of a reward in a reinforcement learning system – whether to replace or complement standard RL rewards, or to act as a source of alignment of an RL process with human preference.
The project is suitable for an MSc level student and prior RL and/or LLM experience is essential.