Reconstructing Redacted Text Using A Tranformer LLM

Currently, the main method for enhancing the privacy of text is redaction i.e. masking out selected key words such as names, emails etc. While this is effective for removing obvious personally identifiable information, redacted text may potentially still leak textual patterns that can reveal authorship, sensitive topics (e.g. that the text relates to a medical condition) and leak location information (e.g. though local geographical jargon) but this remains poorly understood.

Modern large language models (LLMs) are trained by masking out words in text and asking the model to guess the missing word. This suggests that these models might also perform well at reconstructing redacted text, and this is what you will investigate in this project. You’ll look at various bodies of sensitive text (medical, political etc), apply various redaction tools and then try to “attack” the redacted text by using an LLM to reconstruct the redacted words. It is likely that some parts of speak will be easier to reconstruct than others, and that the reconstruction performance will vary depending on how similar the unredacted text is to the text used to train the LLM. You will therefore also look at fine-tuning the LLM to your body of text and explore how that impacts reconstruction performance.

If time permits, we will look at whether these LLM reconstruction attacks can be used to improve the way that redaction is carried out so as to make it less sensitive to such attacks.

For this project you will need a basic understanding of machine learning methods, to a similar level as that provided by the CSU44061 Machine Learning module.