Oxford Big Data Institute

Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years.

The multi-millennia-long history between dogs and humans has placed them at the forefront of archaeological and genomic research. Despite ongoing efforts including the analysis of ancient dog and wolf genomes, many questions remain regarding the evolutionary processes that led to the diversity of breeds today. Although ancient genome sequences provide valuable information about these processes, their utility is hindered by low depths of coverage and postmortem damage, which inhibits confident genotype calling. In the present study, we assess how genotype imputation of ancient dog and wolf genomes, using a large reference panel, can increase the amount of information provided by ancient datasets. We evaluated imputation accuracy by down-sampling high-coverage dog and wolf genomes to 0.05 to 2× coverage and compared concordance between imputed and high-coverage genotypes. We measured the impact of imputation on principal component analyses and runs of homozygosity (ROH). Our findings show high (R2 > 0.9) imputation accuracy for dogs with coverage as low as 0.5× and for wolves as low as 1.0×. We then imputed a dataset of 90 ancient dog and wolf genomes to assess changes in inbreeding during the last 10,000 y of dog evolution. Ancient dog and wolf populations generally exhibit lower inbreeding levels than present-day individuals. Regions with low ROH density maintained across ancient and present-day dogs were significantly associated with genes related to immunity and chemosensory receptors. Our study indicates that imputing ancient canine genomes is a viable strategy that allows for the use of analytical methods previously limited to high-quality genetic data.

Multivariate genome-wide association analysis of dyslexia and quantitative reading skill improves gene discovery.

The ability to read is an important life skill and a major route to education. Dyslexia, characterized by difficulties with accurate/ fluent word reading, and poor spelling is influenced by genetic variation, with a twin study heritability estimate of 0.4-0.6. Until recently, genomic investigations were limited by modest sample size. We used a multivariate genome-wide association study (GWAS) method, MTAG, to leverage summary statistics from two independent GWAS efforts, boosting power for analyses of dyslexia; the GenLang meta-analysis of word reading (N = 27,180) and the 23andMe, Inc., study of dyslexia (Ncases = 51,800, Ncontrols = 1,087,070). We increased the effective sample size to 1,228,832 participants, representing the largest genetic study of reading-related phenotypes to date. Our analyses identified 80 independent genome-wide significant loci, including 36 regions which were not previously reported as significant. Of these 36 loci, 13 were novel regions with no prior association with dyslexia. We observed clear genetic correlations with cognitive and educational measures. Gene-set analyses revealed significant enrichment of dyslexia-associated genes in four neuronal biological process pathways, and findings were further supported by enrichment of neuronally expressed genes in the developing embryonic brain. Polygenic index analysis of our multivariate results predicted between 2.34-4.73% of variance in reading traits in an independent sample, the National Child Development Study cohort (N = 6410). Polygenic adaptation was examined using a large panel of ancient genomes spanning the last ~15 k years. We did not find evidence of selection, suggesting that dyslexia has not been subject to recent selection pressure in Europeans. By combining existing datasets to improve statistical power, these results provide novel insights into the biology of dyslexia.

The spatiotemporal distribution of human pathogens in ancient Eurasia.

Infectious diseases have had devastating effects on human populations throughout history, but important questions about their origins and past dynamics remain1. To create an archaeogenetic-based spatiotemporal map of human pathogens, we screened shotgun-sequencing data from 1,313 ancient humans covering 37,000 years of Eurasian history. We demonstrate the widespread presence of ancient bacterial, viral and parasite DNA, identifying 5,486 individual hits against 492 species from 136 genera. Among those hits, 3,384 involve known human pathogens2, many of which had not previously been identified in ancient human remains. Grouping the ancient microbial species according to their likely reservoir and type of transmission, we find that most groups are identified throughout the entire sampling period. Zoonotic pathogens are only detected from around 6,500 years ago, peaking roughly 5,000 years ago, coinciding with the widespread domestication of livestock3. Our findings provide direct evidence that this lifestyle change resulted in an increased infectious disease burden. They also indicate that the spread of these pathogens increased substantially during subsequent millennia, coinciding with the pastoralist migrations from the Eurasian Steppe4,5.

Tracing the evolutionary history of the CCR5delta32 deletion via ancient and modern genomes.

The chemokine receptor variant CCR5delta32 is linked to HIV-1 resistance and other conditions. Its evolutionary history and allele frequency (10%-16%) in European populations have been extensively debated. We provide a detailed perspective of the evolutionary history of the deletion through time and space. We discovered that the CCR5delta32 allele arose on a pre-existing haplotype consisting of 84 variants. Using this information, we developed a haplotype-aware probabilistic model to screen 934 low-coverage ancient genomes and traced the origin of the CCR5delta32 deletion to at least 6,700 years before the present (BP) in the Western Eurasian Steppe region. Furthermore, we present strong evidence for positive selection acting upon the CCR5delta32 haplotype between 8,000 and 2,000 years BP in Western Eurasia and show that the presence of the haplotype in Latin America can be explained by post-Columbian genetic exchanges. Finally, we point to complex CCR5delta32 genotype-haplotype-phenotype relationships, which demand consideration when targeting the CCR5 receptor for therapeutic strategies.

Steppe Ancestry in Western Eurasia and the Spread of the Germanic Languages

Multivariate genome-wide association analysis of quantitative reading skill and dyslexia improves gene discovery

Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years

The multi-millenia long history between dogs and humans has placed them at the forefront of archeological and genomic research. Despite ongoing efforts including the analysis of ancient dog and wolf genomes, many questions remain regarding their geographic and temporal origins, and the microevolutionary processes that led to the diversity of breeds today. Although ancient genomes provide valuable information, their use is hindered by low depth of coverage and post-mortem damage, which inhibits confident genotype calling. In the present study, we assess how genotype imputation of ancient dog and wolf genomes, utilising a large reference panel, can improve the resolution provided by ancient datasets. Imputation accuracy was evaluated by down-sampling high coverage dog and wolf genomes to 0.05-2x coverage and comparing concordance between imputed and high coverage genotypes. We measured the impact of imputation on principal component analyses and runs of homozygosity. Our findings show high (R 2 >0.9) imputation accuracy for dogs with coverage as low as 0.5x and for wolves as low as 1.0x. We then imputed a dataset of 90 ancient dog and wolf genomes, to assess changes in inbreeding during the last 10,000 years of dog evolution. Ancient dog and wolf populations generally exhibited lower inbreeding levels than present-day individuals. Interestingly, regions with low ROH density maintained across ancient and present-day samples were significantly associated with genes related to olfaction and immune response. Our study indicates that imputing ancient canine genomes is a viable strategy that allows for the use of analytical methods previously limited to high-quality genetic data.

Publisher Correction: Population genomics of post-glacial western Eurasia.

Ancient DNA reveals evolutionary origins of autoimmune diseases.

The selection landscape and genetic legacy of ancient Eurasians.

The Holocene (beginning around 12,000 years ago) encompassed some of the most significant changes in human evolution, with far-reaching consequences for the dietary, physical and mental health of present-day populations. Using a dataset of more than 1,600 imputed ancient genomes1, we modelled the selection landscape during the transition from hunting and gathering, to farming and pastoralism across West Eurasia. We identify key selection signals related to metabolism, including that selection at the FADS cluster began earlier than previously reported and that selection near the LCT locus predates the emergence of the lactase persistence allele by thousands of years. We also find strong selection in the HLA region, possibly due to increased exposure to pathogens during the Bronze Age. Using ancient individuals to infer local ancestry tracts in over 400,000 samples from the UK Biobank, we identify widespread differences in the distribution of Mesolithic, Neolithic and Bronze Age ancestries across Eurasia. By calculating ancestry-specific polygenic risk scores, we show that height differences between Northern and Southern Europe are associated with differential Steppe ancestry, rather than selection, and that risk alleles for mood-related phenotypes are enriched for Neolithic farmer ancestry, whereas risk alleles for diabetes and Alzheimer's disease are enriched for Western hunter-gatherer ancestry. Our results indicate that ancient selection and migration were large contributors to the distribution of phenotypic diversity in present-day Europeans.

100 ancient genomes show repeated population turnovers in Neolithic Denmark.

Major migration events in Holocene Eurasia have been characterized genetically at broad regional scales1-4. However, insights into the population dynamics in the contact zones are hampered by a lack of ancient genomic data sampled at high spatiotemporal resolution5-7. Here, to address this, we analysed shotgun-sequenced genomes from 100 skeletons spanning 7,300 years of the Mesolithic period, Neolithic period and Early Bronze Age in Denmark and integrated these with proxies for diet (13C and 15N content), mobility (87Sr/86Sr ratio) and vegetation cover (pollen). We observe that Danish Mesolithic individuals of the Maglemose, Kongemose and Ertebølle cultures form a distinct genetic cluster related to other Western European hunter-gatherers. Despite shifts in material culture they displayed genetic homogeneity from around 10,500 to 5,900 calibrated years before present, when Neolithic farmers with Anatolian-derived ancestry arrived. Although the Neolithic transition was delayed by more than a millennium relative to Central Europe, it was very abrupt and resulted in a population turnover with limited genetic contribution from local hunter-gatherers. The succeeding Neolithic population, associated with the Funnel Beaker culture, persisted for only about 1,000 years before immigrants with eastern Steppe-derived ancestry arrived. This second and equally rapid population replacement gave rise to the Single Grave culture with an ancestry profile more similar to present-day Danes. In our multiproxy dataset, these major demographic events are manifested as parallel shifts in genotype, phenotype, diet and land use.

Population genomics of post-glacial western Eurasia.

Western Eurasia witnessed several large-scale human migrations during the Holocene1-5. Here, to investigate the cross-continental effects of these migrations, we shotgun-sequenced 317 genomes-mainly from the Mesolithic and Neolithic periods-from across northern and western Eurasia. These were imputed alongside published data to obtain diploid genotypes from more than 1,600 ancient humans. Our analyses revealed a 'great divide' genomic boundary extending from the Black Sea to the Baltic. Mesolithic hunter-gatherers were highly genetically differentiated east and west of this zone, and the effect of the neolithization was equally disparate. Large-scale ancestry shifts occurred in the west as farming was introduced, including near-total replacement of hunter-gatherers in many areas, whereas no substantial ancestry shifts happened east of the zone during the same period. Similarly, relatedness decreased in the west from the Neolithic transition onwards, whereas, east of the Urals, relatedness remained high until around 4,000 BP, consistent with the persistence of localized groups of hunter-gatherers. The boundary dissolved when Yamnaya-related ancestry spread across western Eurasia around 5,000 BP, resulting in a second major turnover that reached most parts of Europe within a 1,000-year span. The genetic origin and fate of the Yamnaya have remained elusive, but we show that hunter-gatherers from the Middle Don region contributed ancestry to them. Yamnaya groups later admixed with individuals associated with the Globular Amphora culture before expanding into Europe. Similar turnovers occurred in western Siberia, where we report new genomic data from a 'Neolithic steppe' cline spanning the Siberian forest steppe to Lake Baikal. These prehistoric migrations had profound and lasting effects on the genetic diversity of Eurasian populations.

Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations.

Multiple sclerosis (MS) is a neuro-inflammatory and neurodegenerative disease that is most prevalent in Northern Europe. Although it is known that inherited risk for MS is located within or in close proximity to immune-related genes, it is unknown when, where and how this genetic risk originated1. Here, by using a large ancient genome dataset from the Mesolithic period to the Bronze Age2, along with new Medieval and post-Medieval genomes, we show that the genetic risk for MS rose among pastoralists from the Pontic steppe and was brought into Europe by the Yamnaya-related migration approximately 5,000 years ago. We further show that these MS-associated immunogenetic variants underwent positive selection both within the steppe population and later in Europe, probably driven by pathogenic challenges coinciding with changes in diet, lifestyle and population density. This study highlights the critical importance of the Neolithic period and Bronze Age as determinants of modern immune responses and their subsequent effect on the risk of developing MS in a changing environment.

The spatiotemporal distribution of human pathogens in ancient Eurasia and the emergence of zoonotic diseases

Tracing the evolutionary path of the CCR5delta32 deletion via ancient and modern genomes

The Selection Landscape and Genetic Legacy of Ancient Eurasians

Population genomics of postglacial western eurasia

Elevated genetic risk for multiple sclerosis originated in Steppe Pastoralist populations

Reply to Peng et al.: Chicken tessellation requires more pieces.

HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data.

Identification of specific species in metagenomic samples is critical for several key applications, yet many tools available require large computational power and are often prone to false positive identifications. Here we describe High-AccuracY and Scalable Taxonomic Assignment of MetagenomiC data (HAYSTAC), which can estimate the probability that a specific taxon is present in a metagenome. HAYSTAC provides a user-friendly tool to construct databases, based on publicly available genomes, that are used for competitive read mapping. It then uses a novel Bayesian framework to infer the abundance and statistical support for each species identification and provide per-read species classification. Unlike other methods, HAYSTAC is specifically designed to efficiently handle both ancient and modern DNA data, as well as incomplete reference databases, making it possible to run highly accurate hypothesis-driven analyses (i.e., assessing the presence of a specific species) on variably sized reference databases while dramatically improving processing speeds. We tested the performance and accuracy of HAYSTAC using simulated Illumina libraries, both with and without ancient DNA damage, and compared the results to other currently available methods (i.e., Kraken2/Bracken, KrakenUniq, MALT/HOPS, and Sigma). HAYSTAC identified fewer false positives than both Kraken2/Bracken, KrakenUniq and MALT in all simulations, and fewer than Sigma in simulations of ancient data. It uses less memory than Kraken2/Bracken, KrakenUniq as well as MALT both during database construction and sample analysis. Lastly, we used HAYSTAC to search for specific pathogens in two published ancient metagenomic datasets, demonstrating how it can be applied to empirical datasets. HAYSTAC is available from https://github.com/antonisdim/HAYSTAC.

Search results

Found 11598 matches for