Integrating common and rare genetic variation in diverse human populations.
Altshuler DM., Gibbs RA., Peltonen L., Altshuler DM., Gibbs RA., Peltonen L., Dermitzakis E., Schaffner SF., Yu F., Peltonen L., Dermitzakis E., Bonnen PE., Altshuler DM., Gibbs RA., de Bakker PIW., Deloukas P., Gabriel SB., Gwilliam R., Hunt S., Inouye M., Jia X., Palotie A., Parkin M., Whittaker P., Yu F., Chang K., Hawes A., Lewis LR., Ren Y., Wheeler D., Gibbs RA., Muzny DM., Barnes C., Darvishi K., Hurles M., Korn JM., Kristiansson K., Lee C., McCarrol SA., Nemesh J., Dermitzakis E., Keinan A., Montgomery SB., Pollack S., Price AL., Soranzo N., Bonnen PE., Gibbs RA., Gonzaga-Jauregui C., Keinan A., Price AL., Yu F., Anttila V., Brodeur W., Daly MJ., Leslie S., McVean G., Moutsianas L., Nguyen H., Schaffner SF., Zhang Q., Ghori MJR., McGinnis R., McLaren W., Pollack S., Price AL., Schaffner SF., Takeuchi F., Grossman SR., Shlyakhter I., Hostetter EB., Sabeti PC., Adebamowo CA., Foster MW., Gordon DR., Licinio J., Manca MC., Marshall PA., Matsuda I., Ngare D., Wang VO., Reddy D., Rotimi CN., Royal CD., Sharp RR., Zeng C., Brooks LD., McEwen JE.
Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.