Teixeira Marco, Souque Celia, Worby Colin J, Shea Terrance, Commins Nicoletta, Smith Joshua T, Miklos Arjun M, Abeel Thomas, Earl Ashlee M, Manson Abigail L
Infectious Disease and Microbiome Program, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, 02142, USA.
Delft Bioinformatics Lab, Department of Intelligent Systems, Delft University of Technology, Delft, 2628XE, The Netherlands.
bioRxiv. 2025 Aug 5:2025.07.28.667252. doi: 10.1101/2025.07.28.667252.
The ability to detect and reconstruct plasmids from genome assemblies is crucial for studying the evolution and spread of antimicrobial resistance and virulence in bacteria. Though long-read sequencing technologies have made reconstructing plasmids easier, most (97%) of the bacterial genome assemblies in the public domain are generated from short-read data. Work to compare plasmid reconstruction tools has focused primarily on , leaving gaps in our understanding of how well these tools perform on other, less well-characterized, taxa. Using high quality assemblies as ground truth, we benchmarked 12 plasmid detection tools (which identify plasmid contigs in assemblies) and four plasmid reconstruction tools (which group contigs from the same plasmid together). We tested their ability to characterize diverse plasmids from short-read assemblies representing a wide range of Enterobacterales and species, including newly discovered and poorly characterized species collected from non-human hosts. Plasmer, PlasmidEC, PlaScope, and gplas2 were the highest-scoring plasmid detection tools, performing well for both Enterobacterales and enterococci. The two major determinants of accurate plasmid detection were representation in plasmid databases - with Enterobacterales plasmids being more easily detected than those from enterococci - and assembly contiguity, which was also key for successful plasmid reconstruction. Gplas2 performed best for plasmid reconstruction; however, less than half of plasmids were perfectly reconstructed, suggesting that substantial room for improvement remains in this class of tools.
从基因组组装中检测和重建质粒的能力对于研究细菌中抗菌素耐药性和毒力的进化与传播至关重要。尽管长读长测序技术使质粒重建变得更容易,但公共领域中大多数(97%)细菌基因组组装是由短读长数据生成的。比较质粒重建工具的工作主要集中在……,这使得我们对这些工具在其他特征较少的分类群上的表现了解不足。我们以高质量组装作为基准,对12种质粒检测工具(识别组装中的质粒重叠群)和4种质粒重建工具(将来自同一质粒的重叠群分组在一起)进行了基准测试。我们测试了它们从代表广泛肠杆菌科和其他物种的短读长组装中鉴定不同质粒的能力,包括从非人类宿主收集的新发现和特征不明确的物种。Plasmer、PlasmidEC、PlaScope和gplas2是得分最高的质粒检测工具,在肠杆菌科和肠球菌中表现良好。准确检测质粒的两个主要决定因素是质粒数据库中的代表性——肠杆菌科质粒比肠球菌的质粒更容易检测——以及组装的连续性,这也是成功进行质粒重建的关键。Gplas2在质粒重建方面表现最佳;然而,不到一半的质粒被完美重建,这表明这类工具仍有很大的改进空间。