[Taken] Improving Speech Recognition for the Irish Language

The Irish language is considered a low resource language as it has less data available for research, due to it being a minority language. This has implications on technological advancements in areas such as speech recognition, where limited data impacts the development of accurate and accessible digital tools. While transfer learning using multi-lingual models has become a popular remedy for the lack of available data, it often means that we overlook the nuances and complexities of each minority language. We know that a single multi-lingual pre-training model will not be able to perform equally across all languages. Our goal is to invest time into curating a method specifically to improve speech recognition for Irish, in hopes this will promote further research being dedicated to all minority languages. Our initial results show that pre-training on certain languages offers better performance boosts than others. We would like to investigate this further to see what are the features that make one language better suited for pre-training than others. The ideal candidate for this would be someone studying Computer Science and Linguistics who has experience with the Irish language, and is interested in machine learning and tackling real-world data issues.

Requirement: being Irish speaker would be a plus