Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

<h4>Background</h4> Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely-related genomes among a background of thousands of other genomes is challenging. <h4>Methods</h4> We describe a refinement to core-genome multi-locus sequence typing (cgMLST) where alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralised database of sequentially-numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to mapping-based approaches in Clostridium difficile using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. <h4>Results</h4> Hash-cgMLST provided the same results as standard cgMLST with minimal performance penalty. Comparing 272 pairs of replicate sequences, using reference-based mapping there were 0, 1 or 2 SNPs between 262(96%), 5(2%) and 1(<1%) pairs respectively. Using hash-cgMLST or standard cgMLST, 197(72%) replicate pairs had zero gene differences, 37(14%), 8(3%) and 30(11%) pairs had 1, 2 and >2 differences respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies. Considering 413 pairs of infections within ≤2 SNPS, i.e. consistent with recent transmission, 266(64%) had ≤2 gene differences and 50(12%) ≥5 differences. Comparing a genome to 100,000 others took <1 minute using hash-cgMLST. <h4>Conclusion</h4> Hash-cgMLST is an effective surveillance tool that can rapidly identify clusters of related genomes. However, cgMLST/hash-cgMLST generates potentially more false variants than mapping-based analysis. Refined mapping-based variant calling is likely required to precisely define close genetic relationships.

Original publication

DOI

10.1101/686212

Type

Journal article

Publication Date

28/06/2019