Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium.
Grove ML., Yu B., Cochran BJ., Haritunians T., Bis JC., Taylor KD., Hansen M., Borecki IB., Cupples LA., Fornage M., Gudnason V., Harris TB., Kathiresan S., Kraaij R., Launer LJ., Levy D., Liu Y., Mosley T., Peloso GM., Psaty BM., Rich SS., Rivadeneira F., Siscovick DS., Smith AV., Uitterlinden A., van Duijn CM., Wilson JG., O'Donnell CJ., Rotter JI., Boerwinkle E.
Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip.