reinforcement learning from LLM feedback
This project aims to work on techniques for fine tuning LLMs to act as a source of a reward in a reinforcement learning system – whether to replace or complement standard RL rewards, or to act as a source of alignment of an RL process with human preference. The project is suitable for an MSc … Read more