Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.


Data Science (DS) algorithms interpret outcomes of empirical experiments with random influences. Often, such algorithms are cascaded to long processing pipelines especially in biomedical applications. The validation of such pipelines poses an open question since data compression of the input should preserve as much information as possible to distinguish between possible outputs. Starting with a minimum description length argument for model selection we motivate a localization criterion as a lower bound that achieves information theoretical optimality. Uncertainty in the input causes a rate distortion tradeoff in the output when the DS algorithm is adapted by learning. We present design choices for algorithm selection and sketch a theory of validation. The concept is demonstrated in neuroscience applications of diffusion tensor imaging for tractography and brain parcellation.



Joachim M. Buhmann is a Professor for Computer Science at ETH Zurich. He studied physics at TU-Munich and performed postdoctoral research at USC and LLNL in California. Until 2003 he was a Professor for Applied Computer Science at the University of Bonn. His teaching and research includes Machine Learning in theory and applications, e.g. in the life sciences. He is a member of the Swiss Academy of Engineering Sciences (SATW), a Fellow of the IAPR and he serves as a research council member of the Swiss National Science Foundation.