Kumar Animesh, Robertsen Espen M, Willassen Nils P, Fu Juan, Hjerde Erik
Center for Bioinformatics, Department of Chemistry, UiT The Arctic University of Norway, Tromsø, 9037, Norway.
Faculty of Biosciences, Department of Livestock and Aquaculture Science, Norwegian University of Life Sciences, Ås 1433, Norway.
Genomics Inform. 2023 Dec;21(4):e49. doi: 10.5808/gi.23072. Epub 2023 Dec 29.
Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.
测序技术和平台的最新进展使得使用不同的测序平台生成宏基因组序列成为可能。在本研究中,我们分析并比较了由HiSeq3000和BGISEQ-500平台从挪威海岸采集的12个沉积物样本中生成的鸟枪法宏基因组序列。对两个平台的宏基因组DNA序列进行了碱基数量归一化处理,并使用不同的分类器、参考数据库和组装器进行了进一步评估。预处理后,归一化的BGISEQ-500序列保留了更多的读段和碱基计数,而HiSeq3000序列在分类学上的分类比例略高。对于两个平台,Kaiju相对于Kraken2对更高比例的读段进行了分类,分类学分类参考数据库的比较表明,MAR数据库优于RefSeq。与使用metaSPAdes相比,在大多数HiSeq3000样本中,使用MEGAHIT进行组装产生的组装片段更长,总重叠群数量更多,但使用未处理或归一化的读段时,组装统计数据有显著改善。我们的结果表明,在宏基因组样本的分类学分类读段百分比和组装重叠群统计方面,两个平台的表现相当。本研究为研究人员在为其宏基因组学研究选择合适的测序平台和生物信息学流程方面提供了有价值的见解。