Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Diet is a major factor in many non-communicable diseases (NCDs), with poor diet a leading cause of premature death and ill health. foodDB is a unique big-data approach to monitoring, measuring and evaluating food systems and interventions, with a software and database platform collecting and processing data on over 125,000 products available to buy from eight UK supermarkets daily, and now spans 8 other countries, with 30 retailers in total. foodDB has been running since 2017 and contains a rich and complex dataset that allows for unprecedented granular analysis and is used in research such as an evaluation of the UK Soft Drinks Industry Levy, exploration of the use of price promotions on healthy compared to unhealthy products, and development of an algorithm to establish environmental indexing for food and drink products.  Whilst products in foodDB are organised by the classification systems used by each specific retailer, these categories do not match up across different retailers, and do not map easily to categorisation systems used by researchers and policy makers. This heterogeneity causes difficulties in defining datasets for analysis and monitoring.  The core aims of this project are to leverage the data within foodDB and machine learning techniques in order to develop an automated, generic method for classifying food and drink products into a small set of predefined categorisation systems, and to apply this method to the whole of the foodDB dataset. A supplementary component of the project, if time allows, is to develop and analyse an unsupervised categorisation system based on nutrition and environmental data, with no a priori knowledge. The output of this project will have significant impact on the research carried out and supported by foodDB and will be written up into a peer-reviewed research paper.  The successful student can expect to work with a skilled, encouraging and friendly multidisciplinary group of researchers, software engineers and data scientists, applying cutting edge techniques to large datasets in a field that has real-world and immediate population health impact. 

This is an exciting project, and we hope you’ll be interested in joining us to: 

  • Help to find the big answers to big questions in public health, tackling obesity, diabetes and other non-communicable diseases 
  • Use data science, big data, and cutting-edge technology in a real world setting to help people live healthier, happier lives 
  • Work in the fantastic Big Data Institute, the world’s largest health big data institute 
  • Conduct research at the intersection of technology, data, science, diet, nutrition and public health 
  • Be part of a fantastic, fun, interesting, multidisciplinary and growing team 
  • Learn, create, innovate and take responsibility for your research 

Who we’re looking for 

We’re looking for someone who enjoys creating, building and delivering digital solutions, is data-driven and knows how to manipulate data and solve problems creatively. If the challenge of making a difference to public health, building and using fantastic technology, digging into large datasets is of interest, we’d love you to submit an application. If you want further information, please contact Dr Richie Harrington at . 


The core component of this project is envisaged to take 10-12 weeks to complete.    

Selection Criteria 

This project would be most suited to someone adept at software programming in Python (equivalent R skills would also be considered), and with prior experience in basic machine learning techniques, particularly those involved in classification. 

Our team