Multiomic machine learning across biobanks
summary
This theme addresses the challenges and strategies involved in applying machine learning (ML) techniques to multiomics data from distributed biobanks. It focuses on ensuring that these models are robust, generalizable, and can be effectively translated across diverse datasets and populations.
Key areas include:
- Multimodal data integration: The module pursues the integration of diverse omics data types, such as genomics, proteomics, and metabolomics, which provide a comprehensive understanding of biological systems.
- Generalisability and transportability: This module examines methods to ensure that machine learning models are not only fitting well to the specific dataset but are also generalizable to other biobanks. This includes strategies to improve the model’s adaptability and performance across different populations and settings.
- Model validation: The module highlights the importance of rigorous validation processes to assess the performance and reliability of ML models when applied to distributed datasets, including cross-validation, external validation, and the use of benchmark datasets for effective model evaluation.
- Applications in personalized medicine: By enhancing the generalisability and transportability of ML models, the module suggests significant potential improvements in personalized medicine, allowing for more accurate prediction tailored to individual patient profiles across varied demographic groups.
This theme emphasizes the development of ML models that can be validated and employed reliably across different biobanks, enhancing their utility in personalized medicine.
references
- Liu J, Yang M, Yu Y, Xu H, Li K, Zhou X. Large language models in bioinformatics: applications and perspectives. ArXiv [Preprint]. 2024 Jan 8:arXiv:2401.04155v1. PMID: 38259343; PMCID: PMC10802675.
- Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan, Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources, Nucleic Acids Research, 2024;, gkae1080, https://doi.org/10.1093/nar/gkae1080
- Combining machine learning with Cox models to identify predictors for incident post-menopausal breast cancer in the UK Biobank X Liu, D Morelli, TJ Littlejohns, DA Clifton, L Clifton, Scientific Reports 13 (1), 9221