Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Abstract

In this talk, I'd like to discuss the importance and connections of three principles of data science in the title and introduce the PCS workflow for the data science life cycle.PCS will be demonstrated in the context of two collaborative projects in neuroscience and genomics, respectively. The first project in neuroscience uses transfer learning to integrate fitted convolutional neural networks (CNNs)on ImageNet with regression methods to provide predictive and stable characterizations of neurons from the challenging primary visual cortex V4. Our DeepTune characterization provides a rich description of the diverse V4 selection patterns. The second project proposes iterative random forests (iRF) as stabilized Random Forests (RF) to seek predictable and interpretable high-order interactions among biomolecules. For an enhancer status prediction problem for Drosophila based on high-throughput data, iRF was able to find 20 stable gene-gene interactions, of which 80% had been physically verified in the literature in the past few decades. Last but not least, the data results from both projects provide experimentally testable hypotheses and hence PCS can also serve as a scientific recommendation system for follow-up experiments.

Biography

Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Sciences at the University of California at Berkeley. Her current research interests focus on statistics and machine learning theory, methodologies and algorithms for solving high-dimensional data problems. Her group is engaged in interdisciplinary research with scientists from genomics, neuroscience, and precision medicine.

She obtained her B.S. degree in Mathematics from Peking University in 1984, her M.A. and Ph.D. degrees in Statistics from the University of California at Berkeley in 1987 and 1990, respectively. She is Member of the U.S. National Academy of Sciences and Fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014 and the Rietz Lecturer of IMS in 2016.