[TAKEN] Investigating the Reproducibility of Studies which conducted Data Analysis or Machine Learning

**This project is for students in the Statistics & Data Science programme (M.Sc) or MSc in Statistics and Sustainability **

“Reproducibility, closely related to replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a statistical analysis of a data set should be achieved again with a high degree of reliability when the study is replicated. … With a narrower scope, reproducibility has been defined in computational sciences as having the following quality: the results should be documented by making all data and code available in such a way that the computations can be executed again with identical results.” – Wikipedia

This project is about creating a framework for measuring reproducibility and then examining the reproducibility of studies that have conducted educational data mining/machine learning (e.g., clustering, prediction modelling, classification). While my interest is in the discipline of education, this study could be modified to look at reproducibility in sustainability for example.

The first step of this project will involve identifying multiple studies which feature publicly available data and that have conducted educational data mining and machine learning. The second step is to repeat the statistical analysis undertaken for each study.

For each study, the aim is to identify:

Whether the method is detailed enough to be reproducible.

Whether any cleaning of the data can be reproduced.

Whether the results can be duplicated within an acceptable margin of error.