Clustering multiple data sets using consensus clustering

Clustering methods are usually applied to a single data set of interest. However, several related data sets can sometimes be available. In this case it is often of interest to determine whether the features of a single data set identified by a cluster analysis are common across all data sets. Consensus clustering has been proposed … Read more

Deep-Learning Bibliographic Reference Strings with a 1-Billion-Instances Dataset

Problem/Background Effective citation parsing is crucial for academic search engines, patent databases and many other applications in academia, law, and intellectual property protection. It helps to identify related documents and calculate the impact of researchers and journals (e.g. h-index). “Citation parsing” refers to identifying and extracting a reference like “[4]” in the full text, and … Read more