This project investigates how post-hoc counterfactual explanations can be used to debug opaque models such as deep neural networks by revealing which feature changes most influence predictions. In applications like anomaly detection, counterfactuals help clarify why certain cases are flagged as abnormal and expose when models rely on spurious correlations or biased patterns. By using counterfactual reasoning, we aim to identify design flaws in data, architecture, or training processes, while also improving trust and interpretability in high-stakes domains.
Strong programming skills in Python are essential. Prior experience with machine learning frameworks is highly desirable. A track record of relevant projects (e.g., via GitHub) would will be an advantage.
Relevant reading for the project https://arxiv.org/pdf/2009.13211 , /arxiv.org/pdf/2401.09489