BDI Seminar: Statistical strategies for enhanced metabolic phenotyping and biomarker recovery
Professor Elaine Holmes, Head of the Division of Computational & Systems Medicine, Imperial College London
Monday, 30 April 2018, 10.30am to 11.30am
Seminar Room 0, Big Data Institute, Oxford, OX3 7LF
Abstract
The metabolic phenotype can provide a window onto dynamic biochemical responses to physiological and pathological stimuli. Metabolic profiling platforms for analyzing biosamples, encompassing high-resolution spectroscopic methods (NMR spectroscopy, LC-MS, GC-MS etc) in combination with multivariate statistical modelling tools, have been shown to be well-suited to generating metabolic signatures reflecting gene-environment interactions. Consequent demand for sensitive, high quality disease diagnostics has facilitated the development of new technological and statistical methods for extracting biomarkers from NMR spectra resulting in improved elucidation of pathological mechanisms. The combination of multiple spectroscopic and statistical approaches is most effective, thus an analytical strategy for spectral alignment, scaling, curve resolution and quantification, statistical correlation and annotation is desirable. An analytical pipeline is presented with particular focus on a series of methods for enhancing biomarker detection via a family of statistical correlation algorithms.
Key bottlenecks in integrating metabolic profiling into translational medicine pipelines or workflows include: lack of automated peak/metabolite annotation; introduction of artefacts due to alignment and pre-processing algorithms; lack of standardisation across laboratories; and limited availability of methods capturing dynamic metabolic changes that can accommodate missing data. An outline of the current bottlenecks in data processing, modelling and interpretation is given with suggestions of statistical tools for overcoming some of these limitations focussing on homospectroscopic or heterospectroscopic correlation algorithms, adjustment for multiple confounders and time series analysis as a suite of tools that can be applied to extract new correlates between datasets and establish biological coherence across metabolic pathways and networks.