Semi-Unsupervised Learning: Clustering and Classifying using Ultra-Sparse Labels

Willetts M., Roberts S., Holmes C.

In semi-supervised learning for classification, i t is assumed that every ground truth class of data is present in the small labelled dataset. In many real-world sparsely-labelled datasets, it is possible that not all ground-truth classes are captured in the labelled dataset: a biased data collection process could result in some classes of data to be found only in the unlabelled dataset. We call this regime 'semi-unsupervised learning', an extreme case of semi-supervised learning, where some classes have no labelled exemplars. First, we outline the pitfalls associated with trying to apply deep generative model (DGM)-based semi-supervised learning algorithms to datasets of this type. We then show how a combination of clustering and semi-supervised learning, using DGMs, can be brought to bear on this problem. We study several different datasets, showing how one can still learn effectively when half of the ground truth classes are entirely unlabelled and the other half are sparsely labelled.

More information Original publication

DOI

10.1109/BigData50022.2020.9378265

Type

Conference paper

Publication Date

2020-12-10T00:00:00+00:00

Pages

5286 - 5295

Cookies on this website

Semi-Unsupervised Learning: Clustering and Classifying using Ultra-Sparse Labels

Willetts M., Roberts S., Holmes C.

DOI

Type

Publication Date

Pages

Total pages