Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Original publication

DOI

10.1126/science.1205438

Type

Journal article

Journal

Science (New York, N.Y.)

Publication Date

12/2011

Volume

334

Pages

1518 - 1524

Addresses

Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. dnreshef@mit.edu

Keywords

Intestines, Animals, Humans, Mice, Saccharomyces cerevisiae, Obesity, Data Interpretation, Statistical, Genomics, Gene Expression, Genes, Fungal, Algorithms, Baseball, Female, Male, Metagenome