How Best To Optimise Machine Learning Hyperparameters?

When designing and training a neural network model the hyperparameters include the SGD step size, mini-batch size, gradient decay policy, choice of regularisation etc. Selecting values for these hyperparameters is a key step in obtaining a useful model. While selection is commonly based on heuristics and trial and error, there is also much interest in methods for automatically optimising these hyperparameters. In this project you will investigate and evaluate some of the popular hyperparameter optimisation methods e.g. in Raytune https://docs.ray.io/en/latest/tune/index.html.

We will use relatively small models so that evaluation can be carried out more quickly. Since transformer neural network models are topical, I suggest we use a small one of those. This will also allow us to obtain “ground truth” by brute force search, something which has been largely lacking in previous studies (most of the literature focusses on larger models that take a long time to train). This will give us a proper baseline against which to evaluate and compare the hyperparameter optimisation methods, and hopefully let is make recommendations as to the best method(s) to use in different circumstances.

For this project you will need a basic background in machine learning e.g. at the level provided by the CSU44061 ML module.