Inferring whole-genome histories in large population datasets.

Kelleher J.; Wong Y.; Wohns AW.; Fadil C.; Albers PK.; McVean G.

Inferring whole-genome histories in large population datasets.

Kelleher J., Wong Y., Wohns AW., Fadil C., Albers PK., McVean G.

Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.

Original publication

DOI

10.1038/s41588-019-0483-y

Type

Journal article

Journal

Nature genetics

Publication Date

02/09/2019

Volume

Pages

1330 - 1338

Addresses

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK. jerome.kelleher@bdi.ox.ac.uk.

Keywords

Humans, Pedigree, Genetics, Population, Population Density, Evolution, Molecular, Haplotypes, Mutation, Polymorphism, Single Nucleotide, Genome, Human, Algorithms, Models, Genetic, Computer Simulation, Selection, Genetic, Datasets as Topic

Cookies on this website

Inferring whole-genome histories in large population datasets.

Kelleher J., Wong Y., Wohns AW., Fadil C., Albers PK., McVean G.

DOI

Type

Journal

Publication Date

Volume

Pages

Addresses

Keywords