Efficient large-scale pre-training for foundation wearable models

Project summary

Accurate measurements of physical activity and sleep from wearables are required to delineate the association between lifestyle risk factors and human diseases. To develop a high-performance human activity recognition model, large volume of labelled data is needed, which is expensive to obtain for time series wearable data. To mitigate the limited labelled data, developing foundation models on unlabelled data is highly desirable such that one can develop high-performance activity recognition model with much less labelled data.

Foundation models underpin recent advances in machine learning applications, e.g. Chat-GPT/Dalle-2. However, large-scale pretraining is prohibitively expensive. Only those with access to large amounts of data and computing resources can afford to do so. There is a crucial need to make pre-training more data efficient. In this project, we intend to formulate a human-centric pretraining framework for wearable sensing data by investigating the trade-off between inter-person and intra-person variability. With our proposed pretraining framework, we hope to reduce training costs significantly to allow foundation model development in a resource-constraint setting.

This project will make it more computationally efficient to obtain a foundation model that can be used to improve the measurement of physical activity and sleep, a key bottleneck when dealing with a large volume of unlabelled time series data from diverse populations.

Concretely, you will have the opportunity to

Build upon foundation models for wearables from our group to assess the current computational costs for large-scale pretraining.
Formulate the trade-off for entity-centric pretraining algorithmically to quantify the information gain from inter-person and intra-person variability.
Develop more efficient pretraining methods and test their effectiveness.

If time allows, this project can be written up for a machine learning or ubiquitous computing conference.

Timescale

10-12 weeks.

Day-to-day supervision

Hang Yuan/Shing Chan

Suitability

1) Individuals with strong programming skills in Python, ideally demonstrated through open source projects

2) Familiarity with deep learning concepts and deep learning frameworks such as PyTorch or TensorFlow

3) Experience with working with large-scale datasets will be desirable but not mandatory

Our team

Aiden Doherty

Professor of Biomedical Informatics
Shing Chan

Postdoctoral Research Scientist in Statistical Machine Learning for Cardiovascular Medicine
Hang Yuan

Early Career Research Fellow

Cookies on this website

Efficient large-scale pre-training for foundation wearable models

Project summary

Timescale

Day-to-day supervision

Suitability

Our team

Aiden Doherty

Shing Chan

Hang Yuan