Gaining access to large, labelled sets of relevant images is crucial for the development and testing of biomedical imaging algorithms. Using images found in biomedical research articles would contribute some way towards a solution to this problem. However, this approach critically depends on being able to identify the most relevant images from very large sets of potentially useful figures. In this paper a deep convolutional neural network (CNN) classifier is trained using only synthetic data, to rapidly and accurately label raw images taken from biomedical articles. We apply this method in the context of detecting faces in biomedical images; and show that the classifier is able to retrieve figures containing faces with an average precision of 94.8%, from a dataset of over 31,000 images taken from articles held in the PubMed database. The utility of the classifier is then demonstrated through a case study, by aiding the mining of photographs of patients with rare genetic disorders from targeted articles. This approach is readily adaptable to facilitate the retrieval of other categories of biomedical images.

Original publication




Conference paper

Publication Date



562 - 567