A better coefficient of determination for genetic profile analysis.
Lee SH., Goddard ME., Wray NR., Visscher PM.
Genome-wide association studies have facilitated the construction of risk predictors for disease from multiple Single Nucleotide Polymorphism markers. The ability of such "genetic profiles" to predict outcome is usually quantified in an independent data set. Coefficients of determination (R(2) ) have been a useful measure to quantify the goodness-of-fit of the genetic profile. Various pseudo-R(2) measures for binary responses have been proposed. However, there is no standard or consensus measure because the concept of residual variance is not easily defined on the observed probability scale. Unlike other nongenetic predictors such as environmental exposure, there is prior information on genetic predictors because for most traits there are estimates of the proportion of variation in risk in the population due to all genetic factors, the heritability. It is this useful ability to benchmark that makes the choice of a measure of goodness-of-fit in genetic profiling different from that of nongenetic predictors. In this study, we use a liability threshold model to establish the relationship between the observed probability scale and underlying liability scale in measuring R(2) for binary responses. We show that currently used R(2) measures are difficult to interpret, biased by ascertainment, and not comparable to heritability. We suggest a novel and globally standard measure of R(2) that is interpretable on the liability scale. Furthermore, even when using ascertained case-control studies that are typical in human disease studies, we can obtain an R(2) measure on the liability scale that can be compared directly to heritability.