With the emergence of Biobanks alongside large-scale genome-wide association studies (GWAS) we will soon be in the enviable situation of obtaining precise estimates of population allele frequencies for SNPs which make up the panels in standard genotyping arrays, such as those produced from Illumina and Affymetrix. For disease association studies it is well known that for rare diseases with known population minor allele frequencies (pMAFs) a case-only design is most powerful. That is, for a fixed budget the optimal procedure is to genotype only cases (affecteds). In such tests experimenters look for a divergence from allele distribution in cases from that of the known population pMAF; in order to test the null hypothesis of no association between the disease status and the allele frequency. However, what has not been previously characterized is the utility of controls (known unaffecteds) when available. In this study we consider frequentist and Bayesian statistical methods for testing for SNP genotype association when population MAFs are known and when both cases and controls are available. We demonstrate that for rare diseases the most powerful frequentist design is, somewhat counterintuitively, to actively discard the controls even though they contain information on the association. In contrast we develop a Bayesian test which uses all available information (cases and controls) and appears to exhibit uniformaly greater power than all frequentist methods we considered.

Original publication




Journal article


Genet Epidemiol

Publication Date





371 - 378


Algorithms, Alleles, Bayes Theorem, Case-Control Studies, Gene Frequency, Genetic Predisposition to Disease, Genome-Wide Association Study, Genotype, Homozygote, Humans, Models, Genetic, Models, Statistical, Polymorphism, Single Nucleotide, Regression Analysis, Research Design