Zielezinski Andrzej, Gudyś Adam, Barylski Jakub, Siminski Krzysztof, Rozwalak Piotr, Dutilh Bas E, Deorowicz Sebastian
Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University, Poznan, Poland.
Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland.
Nat Methods. 2025 May 15. doi: 10.1038/s41592-025-02701-7.
Viromics produces millions of viral genomes and fragments annually, overwhelming traditional sequence comparison methods. Here we introduce Vclust, an approach that determines average nucleotide identity by Lempel-Ziv parsing and clusters viral genomes with thresholds endorsed by authoritative viral genomics and taxonomy consortia. Vclust demonstrates superior accuracy and efficiency compared to existing tools, clustering millions of genomes in a few hours on a mid-range workstation.
病毒组学每年产生数百万个病毒基因组和片段,使传统的序列比较方法不堪重负。在此,我们介绍Vclust,一种通过莱姆佩尔-齐夫解析来确定平均核苷酸同一性,并使用权威病毒基因组学和分类学联盟认可的阈值对病毒基因组进行聚类的方法。与现有工具相比,Vclust显示出更高的准确性和效率,在一台中档工作站上只需几个小时就能对数百万个基因组进行聚类。