Suppr超能文献

Proovread:通过迭代短读共识实现大规模高精度 PacBio 校正。

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

机构信息

Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany.

Department for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs-Platz 2, 97082 Würzburg, Germany and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany.

出版信息

Bioinformatics. 2014 Nov 1;30(21):3004-11. doi: 10.1093/bioinformatics/btu392. Epub 2014 Jul 10.

Abstract

MOTIVATION

Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.

RESULTS

Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.

AVAILABILITY AND IMPLEMENTATION

proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de.

摘要

动机

如今,DNA 的基本代码主要通过 Illumina 测序仪提供的合成测序来确定。尽管结果非常准确,但产生的读取片段较短,这给分析带来了挑战。最近,开发了一种新技术,即单分子实时 (SMRT) 测序,它可以解决这些挑战,因为它可以生成几千个碱基的读取片段。但是,由于错误率高,它们的广泛应用受到了阻碍。因此,已经开发了使用高质量短读取来纠正错误的 SMRT 长读取的混合方法。但是,当前的实现方式对硬件的要求很高,仅在定义明确的计算基础架构中运行,并拒绝大量读取。这大大限制了它们的可用性,尤其是在大型测序项目的情况下。

结果

在这里,我们提出了 proovread,一种用于 SMRT 读取的混合纠错管道,它可以在现有的硬件和基础设施上灵活地进行调整,从笔记本电脑到高性能计算集群。在涵盖大肠杆菌、拟南芥和人类的基因组和转录组测试案例中,proovread 实现了高达 99.9%的准确率,并优于现有的混合纠错程序。此外,proovread 纠正后的序列更长,吞吐量更高。因此,proovread 将最准确的纠错结果与对可用硬件的出色适应性相结合。它将因此提高 SMRT 测序的适用性和价值。

可用性和实现

proovread 可在以下网址获得:http://proovread.bioapps.biozentrum.uni-wuerzburg.de。

相似文献

1
proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.
Bioinformatics. 2014 Nov 1;30(21):3004-11. doi: 10.1093/bioinformatics/btu392. Epub 2014 Jul 10.
2
Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.
PLoS One. 2015 Dec 7;10(12):e0144305. doi: 10.1371/journal.pone.0144305. eCollection 2015.
3
Accurate self-correction of errors in long reads using de Bruijn graphs.
Bioinformatics. 2017 Mar 15;33(6):799-806. doi: 10.1093/bioinformatics/btw321.
4
HALC: High throughput algorithm for long read error correction.
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.
6
PBSIM: PacBio reads simulator--toward accurate genome assembly.
Bioinformatics. 2013 Jan 1;29(1):119-21. doi: 10.1093/bioinformatics/bts649. Epub 2012 Nov 4.
7
LSCplus: a fast solution for improving long read accuracy by short read alignment.
BMC Bioinformatics. 2016 Nov 9;17(1):451. doi: 10.1186/s12859-016-1316-y.
8
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
9
Integration of hybrid and self-correction method improves the quality of long-read sequencing data.
Brief Funct Genomics. 2024 May 15;23(3):249-255. doi: 10.1093/bfgp/elad026.
10
HISEA: HIerarchical SEed Aligner for PacBio data.
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

引用本文的文献

1
2
Plasmids of two novel incompatibility groups IncFII and Inc from .
Virulence. 2025 Dec;16(1):2512034. doi: 10.1080/21505594.2025.2512034. Epub 2025 Jun 19.
3
Ecological Niche Adaptations Influence Transposable Element Dynamics in Pollinating and Non-Pollinating Fig Wasps.
Ecol Evol. 2025 Jun 17;15(6):e71553. doi: 10.1002/ece3.71553. eCollection 2025 Jun.
5
Unraveling the genomic epidemiology and plasmid-mediated carbapenem resistance of .
Front Microbiol. 2025 Mar 17;16:1561624. doi: 10.3389/fmicb.2025.1561624. eCollection 2025.
6
Full-length transcriptome annotation of a pyrosome, Pyrosoma atlanticum (Chordata, Thaliacea).
Sci Data. 2024 Dec 24;11(1):1433. doi: 10.1038/s41597-024-04251-7.
7
DeepCorr: a novel error correction method for 3GS long reads based on deep learning.
PeerJ Comput Sci. 2024 Jul 26;10:e2160. doi: 10.7717/peerj-cs.2160. eCollection 2024.
8
Experimental evolution reveals evolutionary bias and its causes.
BMC Ecol Evol. 2024 Dec 2;24(1):145. doi: 10.1186/s12862-024-02331-1.
9
Chromosome-level genome assembly of predatory Arma chinensis.
Sci Data. 2024 Sep 4;11(1):962. doi: 10.1038/s41597-024-03837-5.

本文引用的文献

1
Microbial phylogenetic profiling with the Pacific Biosciences sequencing platform.
Microbiome. 2013 Mar 4;1(1):10. doi: 10.1186/2049-2618-1-10.
2
Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes.
PLoS One. 2013 Jul 23;8(7):e68824. doi: 10.1371/journal.pone.0068824. Print 2013.
3
The advantages of SMRT sequencing.
Genome Biol. 2013 Jul 3;14(7):405. doi: 10.1186/gb-2013-14-6-405.
4
Characterizing and measuring bias in sequence data.
Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.
5
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.
6
PBSIM: PacBio reads simulator--toward accurate genome assembly.
Bioinformatics. 2013 Jan 1;29(1):119-21. doi: 10.1093/bioinformatics/bts649. Epub 2012 Nov 4.
7
Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene.
Genome Res. 2013 Jan;23(1):121-8. doi: 10.1101/gr.141705.112. Epub 2012 Oct 11.
8
Improving PacBio long read accuracy by short read alignment.
PLoS One. 2012;7(10):e46679. doi: 10.1371/journal.pone.0046679. Epub 2012 Oct 4.
9
10
Hybrid error correction and de novo assembly of single-molecule sequencing reads.
Nat Biotechnol. 2012 Jul 1;30(7):693-700. doi: 10.1038/nbt.2280.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验