Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia.
Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom.
PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. eCollection 2022 Jan.
Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.
长读长细菌基因组组装通常包含残留错误,最常见的是同聚物长度错误。短读测序数据纠错工具可以使用短读段来修复这些错误,但大多数都依赖于短读段比对,这在重复区域是不可靠的。因此,这些区域的错误很难修复,并且在短读段纠错后通常仍然存在。在这里,我们引入了 Polypolish,这是一种新的短读段纠错工具,它使用全读段比对来修复其他纠错工具无法修复的重复序列中的错误。Polypolish 在使用模拟和真实读段的基准测试中表现良好,并且在纠错过程中几乎从不引入错误。通过将 Polypolish 与其他短读段纠错工具结合使用,可以获得最佳效果。