Genetic discovery from the multitude of phenotypes extractable from routine healthcare data can transform understanding of the human phenome and accelerate progress toward precision medicine. However, a critical question when analyzing high-dimensional and heterogeneous data is how best to interrogate increasingly specific subphenotypes while retaining statistical power to detect genetic associations. Here we develop and employ a new Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics. Our method displays a more than 20% increase in power to detect genetic effects over other approaches and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs). By applying the approach to genetic risk scores (GRSs), we show the extent of genetic sharing among IMDs and expose differences in disease perception or diagnosis with potential clinical implications.

Original publication

DOI

10.1038/ng.3926

Type

Journal article

Journal

Nature genetics

Publication Date

09/2017

Volume

49

Pages

1311 - 1318

Addresses

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

Keywords

Humans, Genetic Predisposition to Disease, HLA Antigens, Cluster Analysis, Logistic Models, Bayes Theorem, Polymorphism, Single Nucleotide, Alleles, International Classification of Diseases, Adult, Aged, Middle Aged, Delivery of Health Care, Female, Male, Genome-Wide Association Study, Genetic Association Studies, Health Information Systems, United Kingdom