Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.

Original publication

DOI

10.1371/journal.pcbi.1004842

Type

Journal article

Journal

PLoS computational biology

Publication Date

04/05/2016

Volume

12

Addresses

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

Keywords

Humans, Sample Size, Pedigree, Computational Biology, Genetics, Population, Evolution, Molecular, Recombination, Genetic, Algorithms, Models, Genetic, Computer Simulation, Genetic Variation