25/26 PROJECT #6: USING NLP/GenAI TO SUPPORT ANNOTATION OF TEXTS

This project focuses on how to deploy Natural Language Processing and Generative AI approaches to support historians to annotate transcriptions. Typically this means employing techniques to undertake Named Entity Recognition and Entity Linking tasks and providing candidate annotations to the historians through a simple user interface to allow for their validation. This is an urgent problem to solve as tools such as Transkribus (https://www.transkribus.org/) and eScriptorium (https://www.sofer.info/), are using AI-based image processing techniques to unblock what has been a hugely expensive and time-intensive process of turning historical documents into machine readable transcriptions, in a manner that enables a scaling up of the process. This has shifted the challenge to how to efficiently annotate the generated transcriptions for downstream querying and analysis.

To get a sense of the type of user interface needed, take a look at VARD (https://ucrel.lancs.ac.uk/vard/about/) for inspiration or indeed it may be useful as a possible starting point. VARD helps historians undertake a normalisation of text task, which is different from the NER/NEL tasks we want… but the type of user engagement supported in the interface might be useful for our tasks too.

This project will be undertaken in the context of the ERC VOICES project (https://voicesproject.ie/) and collaboration with its historians

KEYWORDS:Graphical User Interface, Natural Language Processing, Generative AI

PREREQUISITES: No prerequisite per se, but definitely helpful to have interest/experience in NLP/generative AI techniques and web-based User Interface design/development