Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Project summary

Accurate measurements of physical activity and sleep from wearables are required to delineate the association between lifestyle risk factors and human diseases. To develop a high-performance human activity recognition model, large volume of labelled data is needed, which is expensive to obtain for time series wearable data. To mitigate the limited labelled data, developing foundation models on unlabelled data is highly desirable such that one can develop high-performance activity recognition model with much less labelled data.

Foundation models underpin recent advances in machine learning applications, e.g. Chat-GPT/Dalle-2. However, large-scale pretraining is prohibitively expensive. Only those with access to large amounts of data and computing resources can afford to do so. There is a crucial need to make pre-training more data efficient. In this project, we intend to formulate a human-centric pretraining framework for wearable sensing data by investigating the trade-off between inter-person and intra-person variability. With our proposed pretraining framework, we hope to reduce training costs significantly to allow foundation model development in a resource-constraint setting.

This project will make it more computationally efficient to obtain a foundation model that can be used to improve the measurement of physical activity and sleep, a key bottleneck when dealing with a large volume of unlabelled time series data from diverse populations.


Concretely, you will have the opportunity to

  1. Build upon foundation models for wearables from our group to assess the current computational costs for large-scale pretraining. 
  2. Formulate the trade-off for entity-centric pretraining algorithmically to quantify the information gain from inter-person and intra-person variability.
  3. Develop more efficient pretraining methods and test their effectiveness.


If time allows, this project can be written up for a machine learning or ubiquitous computing conference. 


10-12 weeks.

Day-to-day supervision

Hang Yuan/Shing Chan


1)  Individuals with strong programming skills in Python, ideally demonstrated through open source projects

2)  Familiarity with deep learning concepts and deep learning frameworks such as PyTorch or TensorFlow

3)  Experience with working with large-scale datasets will be desirable but not mandatory

Our team

  • Aiden Doherty
    Aiden Doherty

    Professor of Biomedical Informatics

  • Shing Chan
    Shing Chan

    Postdoctoral Research Scientist in Statistical Machine Learning for Cardiovascular Medicine

  • Hang Yuan
    Hang Yuan

    Early Career Research Fellow