Abstract

Despite the “completion” of the human reference genome in 2004, substantial proportions of the reference assembly remain undefined at the sequence level and inaccessible to interrogation. Not much is known about the extent and potential functional impact of genetic variation in these regions; this includes, surprisingly, the genes encoding the ribosomal RNA, which account for ~50% of half of all RNA synthesis. We show, combining transformation-associated cloning with long-read sequencing and de novo assembly, that there is sequence variation in human ribosomal DNA genes; and of 101 variants detected in ~0.82 Mb of the nucleolar organizer region of chromosome 21, at least 47 are expressed. Structural modeling suggests that some of the observed variants might modulate local configurations, opening up the possibility of variations in ribosome dynamics. Finally, we discuss algorithmic and experimental strategies that might enable complex-region assembly and reconstruction of truly complete human genomes without the aid of of targeted cloning.