Counterfactual explanations for Explainable and Trustworthy Reinforcement Learning

Reinforcement learning (RL) has been successfully applied in a wide range of domains, demonstrating its potential to perform complex tasks by optimizing reward signals obtained through interaction with the environment. However, real-world tasks often involve multiple, potentially conflicting objectives that are not easily represented by a single scalar reward. Multi-Objective Reinforcement Learning (MORL) addresses this limitation by enabling agents to reason over competing goals and trade-offs.

However, MORL introduces an additional layer of complexity that compounds the existing challenges of RL, particularly its lack of transparency and interpretability.

Explainable Reinforcement Learning (XRL) has emerged to address these concerns, but most existing methods are designed for
single-objective RL, and fail to capture the reasoning behind multi-objective decision-making.

This project will aim to develop new or adapt existing single-objective XRL approaches to multi-objective RL.

The project is suitable for an MSc-level student and prior experience with RL and/or XAI is essential.