Suppr超能文献

评估精确读取对降低读取映射错误率的影响。

Assessing the impact of exact reads on reducing the error rate of read mapping.

机构信息

Mathematics and Computer Science Department, Amirkabir University of Technology (Tehran polytechnic), Tehran, Iran.

School of Biological Science, Institute for Research in Fundamental Sciences (IPM) P.O. Box: 19395-5746, Tehran, Iran.

出版信息

BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.

Abstract

BACKGROUND

Nowadays, according to valuable resources of high-quality genome sequences, reference-based assembly methods with high accuracy and efficiency are strongly required. Many different algorithms have been designed for mapping reads onto a genome sequence which try to enhance the accuracy of reconstructed genomes. In this problem, one of the challenges occurs when some reads are aligned to multiple locations due to repetitive regions in the genomes.

RESULTS

In this paper, our goal is to decrease the error rate of rebuilt genomes by resolving multi-mapping reads. To achieve this purpose, we reduce the search space for the reads which can be aligned against the genome with mismatches, insertions or deletions to decrease the probability of incorrect read mapping. We propose a pipeline divided to three steps: ExactMapping, InExactMapping, and MergingContigs, where exact and inexact reads are aligned in two separate phases. We test our pipeline on some simulated and real data sets by applying some read mappers. The results show that the two-step mapping of reads onto the contigs generated by a mapper such as Bowtie2, BWA and Yara is effective in improving the contigs in terms of error rate.

CONCLUSIONS

Assessment results of our pipeline suggest that reducing the error rate of read mapping, not only can improve the genomes reconstructed by reference-based assembly in a reasonable running time, but can also have an impact on improving the genomes generated by de novo assembly. In fact, our pipeline produces genomes comparable to those of a multi-mapping reads resolution tool, namely MMR by decreasing the number of multi-mapping reads. Consequently, we introduce EIM as a post-processing step to genomes reconstructed by mappers.

摘要

背景

如今,根据高质量基因组序列的宝贵资源,需要具有高精度和高效率的基于参考的组装方法。许多不同的算法被设计用于将读取映射到基因组序列上,这些算法试图提高重建基因组的准确性。在这个问题中,当由于基因组中的重复区域,一些读取被对齐到多个位置时,就会出现一个挑战。

结果

本文的目标是通过解决多映射读取来降低重建基因组的错误率。为了达到这个目的,我们减少了可以与基因组中存在错配、插入或缺失的读取进行对齐的搜索空间,以降低错误读取映射的概率。我们提出了一个分为三个步骤的流水线:精确映射、不精确映射和合并 contigs,其中精确和不精确的读取在两个单独的阶段进行对齐。我们通过应用一些读取映射器在一些模拟和真实数据集上测试了我们的流水线。结果表明,将读取映射到 Bowtie2、BWA 和 Yara 等映射器生成的 contigs 上的两步映射在错误率方面有效地改进了 contigs。

结论

我们的流水线的评估结果表明,降低读取映射的错误率不仅可以在合理的运行时间内提高基于参考的组装重建的基因组,还可以对提高从头组装生成的基因组产生影响。事实上,我们的流水线通过减少多映射读取的数量,产生了与多映射读取分辨率工具 MMR 相当的基因组。因此,我们将 EIM 作为映射器重建的基因组的后处理步骤引入。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c846/6220446/c27b0c488bcb/12859_2018_2432_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验