Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Genetic discovery from the multitude of phenotypes extractable from routine healthcare data can transform understanding of the human phenome and accelerate progress toward precision medicine. However, a critical question when analyzing high-dimensional and heterogeneous data is how best to interrogate increasingly specific subphenotypes while retaining statistical power to detect genetic associations. Here we develop and employ a new Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to analyze genetic variants against UK Biobank disease phenotypes derived from self-reporting and hospital episode statistics. Our method displays a more than 20% increase in power to detect genetic effects over other approaches and identifies new associations between classical human leukocyte antigen (HLA) alleles and common immune-mediated diseases (IMDs). By applying the approach to genetic risk scores (GRSs), we show the extent of genetic sharing among IMDs and expose differences in disease perception or diagnosis with potential clinical implications.

Original publication




Journal article


Nature genetics

Publication Date





1311 - 1318


Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.


Humans, Genetic Predisposition to Disease, HLA Antigens, Cluster Analysis, Logistic Models, Bayes Theorem, Polymorphism, Single Nucleotide, Alleles, International Classification of Diseases, Adult, Aged, Middle Aged, Delivery of Health Care, Female, Male, Genome-Wide Association Study, Genetic Association Studies, Health Information Systems, United Kingdom