Can machines ‘learn’ word meanings just from lots of textual examples? There has been quite a lot of research into this (see Wikipedia intro), all based on the idea that different meanings tend to show up in rather different contexts eg.
move the mouse till the cursor …
dissect the mouse and extract its DNA …
There is a code base in C++ that I could supply implementing a particular approach to this( my 4062 course gives some of the theory underlying this), particularly relating to the temporal dimension of this. There are a number of projects which could be attempted building on this, for example
- an option often taken, but which the current system does not, is to somehow ignore a subset of the vocabulary: the so-called ‘stop words’. Possible ways to integrate that could be developed
- the system predominantly uses an ‘unsupervised’ technique with unlabelled data. Via a ‘pseudo-neologisms’ technique, a type of labelled data can be constructed and work could be done with this, comparing unsupervised and supervised performance