Density-based clustering methods

Density-based clustering methods relate the notion of clusters to high-density contiguous regions of the underlying density function. Density peaks clustering is a prominent density-based method, which detects modes as points with high density and large distance to points of higher density. Each non-mode point is assigned to the same cluster as its nearest neighbour of higher density. Recent work has shown that, while density peaks clustering can perform well in applications, it is susceptible to errors caused by noise in the density estimates.

We have recently proposed a novel clustering algorithm, Component-wise Peak-Finding (CPF), to remedy these issues. CPF applies the density peaks clustering method within connected components of the density function, ensuring the correct assignment of instances to clusters. Furthermore, the algorithm includes a pruning mechanism to remove spurious estimated modes. The algorithm has been shown to outperform the density peaks clustering method. A semi-supervised version of CPF has also been produced, integrating clustering constraints to achieve excellent performance for an important problem in computer vision.

The goal of this project is to (1) improve the existing GitHub code, making the algorithm faster and more flexible; (2) build a complete PyPI package for the clustering method. If desired, the opportunity also exists to develop methodological advancements centred on improving and extending the semi-supervised algorithm.

Given the complexity of this research topic, I am specifically anticipating applications from Computer Science Single Pathway students. You can view an example project here, collaborative effort involving Ralph Swords, whose Final Year Project received a distinguished 1st class grade: https://pypi.org/project/REMclust/

My homepage: https://www.tcd.ie/research/profiles/?profile=zhangm3