Milani Christian, Lugli Gabriele Andrea, Fontana Federico, Mancabelli Leonardo, Alessandri Giulia, Longhi Giulia, Anzalone Rosaria, Viappiani Alice, Turroni Francesca, van Sinderen Douwe, Ventura Marco
Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parmagrid.10383.39, Parma, Italy.
Microbiome Research Hub, University of Parmagrid.10383.39, Parma, Italy.
mSystems. 2021 Jun 29;6(3):e0058321. doi: 10.1128/mSystems.00583-21.
The use of bioinformatic tools for read-based taxonomic and functional analyses of metagenomic data sets, including their assembly and management, is rather fragmentary due to the absence of an accepted gold standard. Moreover, most currently available software tools need input of millions of reads and rely on approximations in data analysis in order to reduce computing times. These issues result in suboptimal results in terms of accuracy, sensitivity, and specificity when used either for the reconstruction of taxonomic or functional profiles through read analysis or analysis of genomes reconstructed by metagenomic assembly. Moreover, the recent introduction of novel DNA sequencing technologies that generate long reads, such as Nanopore and PacBio, represent a valuable data resource that still suffers from a lack of dedicated tools to perform integrated hybrid analysis alongside short read data. In order to overcome these limitations, here we describe a comprehensive bioinformatic platform, METAnnotatorX2, aimed at providing an optimized user-friendly resource which maximizes output quality, while also allowing user-specific adaptation of the pipeline and straightforward integrated analysis of both short and long read data. To further improve performance quality and accuracy of taxonomic assignment of reads and contigs, custom preprocessed and taxonomically revised genomic databases for viruses, prokaryotes, and various eukaryotes were developed. The performance of METAnnotatorX2 was tested by analysis of artificial data sets encompassing viral, archaeal, bacterial, and eukaryotic (fungal) sequence reads that simulate different biological matrices. Moreover, real biological samples were employed to validate results. We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets. Through extensive tests performed on Illumina and Nanopore artificial data sets, we demonstrated the high performance of the software to not only extract taxonomic and functional information from sequence reads but also to assemble and process genomes from metagenomic data. The robustness of these functionalities was validated using "real-life" data sets obtained from Illumina and Nanopore sequencing of biological samples. Furthermore, the performance of METAnnotatorX2 was compared to other available software tools for analysis of shotgun metagenomics data.
由于缺乏公认的金标准,用于宏基因组数据集基于 reads 的分类学和功能分析的生物信息学工具,包括其组装和管理,相当零散。此外,目前大多数可用的软件工具需要数百万条 reads 的输入,并在数据分析中依赖近似值以减少计算时间。当用于通过 reads 分析重建分类学或功能概况,或分析通过宏基因组组装重建的基因组时,这些问题导致在准确性、敏感性和特异性方面的结果次优。此外,最近引入的能生成长 reads 的新型 DNA 测序技术,如纳米孔测序和 PacBio 测序,代表了一种有价值的数据资源,但仍然缺乏与短 reads 数据一起进行综合杂交分析的专用工具。为了克服这些限制,我们在此描述了一个全面的生物信息学平台 METAnnotatorX2,旨在提供一个优化的用户友好资源,该资源能最大化输出质量,同时还允许用户对流程进行特定调整,并能对短 reads 和长 reads 数据进行直接的综合分析。为了进一步提高 reads 和重叠群分类学分配的性能质量和准确性,我们开发了针对病毒、原核生物和各种真核生物的定制预处理和分类学修订的基因组数据库。通过分析包含模拟不同生物基质的病毒、古菌、细菌和真核(真菌)序列 reads 的人工数据集,测试了 METAnnotatorX2 的性能。此外,使用真实生物样本验证结果。我们开发了一种新型工具,即 METAnnotatorX2,它包括许多用于分析深度和浅层宏基因组数据集的新高级功能,并配有(定期更新)针对古菌、细菌、真菌、原生生物和病毒的定制数据库。软件和数据库的开发都是为了在支持浅层宏基因组数据集的同时最大化敏感性和特异性。通过对 Illumina 和纳米孔人工数据集进行的广泛测试,我们证明了该软件的高性能,它不仅能从序列 reads 中提取分类学和功能信息,还能从宏基因组数据中组装和处理基因组。使用从生物样本的 Illumina 和纳米孔测序获得的“实际”数据集验证了这些功能的稳健性。此外,将 METAnnotatorX2 的性能与其他用于分析鸟枪法宏基因组学数据的可用软件工具进行了比较。