Development of a classification engine for foodDB
Diet is a major factor in many non-communicable diseases (NCDs), with poor diet a leading cause of premature death and ill health. foodDB is a unique big-data approach to monitoring, measuring and evaluating food systems and interventions, with a software and database platform collecting and processing data on over 125,000 products available to buy from eight UK supermarkets daily, and now spans 8 other countries, with 30 retailers in total. foodDB has been running since 2017 and contains a rich and complex dataset that allows for unprecedented granular analysis and is used in research such as an evaluation of the UK Soft Drinks Industry Levy, exploration of the use of price promotions on healthy compared to unhealthy products, and development of an algorithm to establish environmental indexing for food and drink products. Whilst products in foodDB are organised by the classification systems used by each specific retailer, these categories do not match up across different retailers, and do not map easily to categorisation systems used by researchers and policy makers. This heterogeneity causes difficulties in defining datasets for analysis and monitoring. The core aims of this project are to leverage the data within foodDB and machine learning techniques in order to develop an automated, generic method for classifying food and drink products into a small set of predefined categorisation systems, and to apply this method to the whole of the foodDB dataset. A supplementary component of the project, if time allows, is to develop and analyse an unsupervised categorisation system based on nutrition and environmental data, with no a priori knowledge. The output of this project will have significant impact on the research carried out and supported by foodDB and will be written up into a peer-reviewed research paper. The successful student can expect to work with a skilled, encouraging and friendly multidisciplinary group of researchers, software engineers and data scientists, applying cutting edge techniques to large datasets in a field that has real-world and immediate population health impact.
This is an exciting project, and we hope you’ll be interested in joining us to:
- Help to find the big answers to big questions in public health, tackling obesity, diabetes and other non-communicable diseases
- Use data science, big data, and cutting-edge technology in a real world setting to help people live healthier, happier lives
- Work in the fantastic Big Data Institute, the world’s largest health big data institute
- Conduct research at the intersection of technology, data, science, diet, nutrition and public health
- Be part of a fantastic, fun, interesting, multidisciplinary and growing team
- Learn, create, innovate and take responsibility for your research
Who we’re looking for
We’re looking for someone who enjoys creating, building and delivering digital solutions, is data-driven and knows how to manipulate data and solve problems creatively. If the challenge of making a difference to public health, building and using fantastic technology, digging into large datasets is of interest, we’d love you to submit an application. If you want further information, please contact Dr Richie Harrington at firstname.lastname@example.org .
The core component of this project is envisaged to take 10-12 weeks to complete.
This project would be most suited to someone adept at software programming in Python (equivalent R skills would also be considered), and with prior experience in basic machine learning techniques, particularly those involved in classification.