Gong Wentao, Pan Xiangchun, Xu Dantong, Ji Guanyu, Wang Yifei, Tian Yuhan, Cai Jiali, Li Jiaqi, Zhang Zhe, Yuan Xiaolong
Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
Shenzhen Gendo Health Technology CO,. Ltd, Shenzhen 518122, China.
Comput Struct Biotechnol J. 2022 Aug 27;20:4704-4716. doi: 10.1016/j.csbj.2022.08.051. eCollection 2022.
Whole genome bisulfite sequencing (WGBS) is an essential technique for methylome studies. Although a series of tools have been developed to overcome the mapping challenges caused by bisulfite treatment, the latest available tools have not been evaluated on the performance of reads mapping as well as on biological insights in multiple mammals. Herein, based on the real and simulated WGBS data of 14.77 billion reads, we undertook 936 mappings to benchmark and evaluate 14 wildly utilized alignment algorithms from reads mapping to biological interpretation in humans, cattle and pigs: Bwa-meth, BSBolt, BSMAP, Walt, Abismal, Batmeth2, Hisat_3n, Hisat_3n_repeat, Bismark-bwt2-e2e, Bismark-his2, BSSeeker2-bwt, BSSeeker2-soap2, BSSeeker2-bwt2-e2e and BSSeeker2-bwt2-local. Specifically, Bwa-meth, BSBolt, BSMAP, Bismark-bwt2-e2e and Walt exhibited higher uniquely mapped reads, mapped precision, recall and F1 score than other nine alignment algorithms, and the influences of distinct alignment algorithms on the methylomes varied considerably at the numbers and methylation levels of CpG sites, the calling of differentially methylated CpGs (DMCs) and regions (DMRs). Moreover, we reported that BSMAP showed the highest accuracy at the detection of CpG coordinates and methylation levels, the calling of DMCs, DMRs, DMR-related genes and signaling pathways. These results suggested that careful selection of algorithms to profile the genome-wide DNA methylation is required, and our works provided investigators with useful information on the choice of alignment algorithms to effectively improve the DNA methylation detection accuracy in mammals.
全基因组亚硫酸氢盐测序(WGBS)是甲基化组研究的一项重要技术。尽管已经开发了一系列工具来克服亚硫酸氢盐处理带来的映射挑战,但最新的可用工具尚未在多个哺乳动物的 reads 映射性能以及生物学见解方面进行评估。在此,基于147.7亿条 reads 的真实和模拟 WGBS 数据,我们进行了936次映射,以对14种广泛使用的比对算法进行基准测试和评估,这些算法涵盖了从人类、牛和猪的 reads 映射到生物学解释的过程:Bwa-meth、BSBolt、BSMAP、Walt、Abismal、Batmeth2、Hisat_3n、Hisat_3n_repeat、Bismark-bwt2-e2e、Bismark-his2、BSSeeker2-bwt、BSSeeker2-soap2、BSSeeker2-bwt2-e2e 和 BSSeeker2-bwt2-local。具体而言,Bwa-meth、BSBolt、BSMAP、Bismark-bwt2-e2e 和 Walt 比其他九种比对算法表现出更高的唯一映射 reads、映射精度、召回率和 F1 分数,并且不同比对算法对甲基化组的影响在 CpG 位点的数量和甲基化水平、差异甲基化 CpG(DMC)和区域(DMR)的调用方面有很大差异。此外,我们报告称 BSMAP 在检测 CpG 坐标和甲基化水平、DMC、DMR、DMR 相关基因和信号通路方面显示出最高的准确性。这些结果表明,需要谨慎选择算法来分析全基因组 DNA 甲基化,并且我们的工作为研究人员提供了有关比对算法选择的有用信息,以有效提高哺乳动物中 DNA 甲基化检测的准确性。