Model-based RL has become the state-of-the-art approach in the field of reinforcement learning, where agents learn to solve problems through trial and error. Unlike model-free RL, model-based RL first learns a world model from the actual environment. The agent is then trained within this world model before being evaluated in the real environment. However, most existing work trains the world model and the agent jointly, at the same time. This raises the question: Is early training of the agent necessary or helpful when the world model may not yet accurately represent the actual environment?
This project aims to explore state-of-the-art reinforcement learning algorithms and address a small but intriguing aspect of the process. Participants will gain experience with autoencoders, transformers, and the PPO algorithm through this project.
(Project proposed by and supervised in collaboration with Dr. Wenlong Wang).