Suppr超能文献

fastp:一个超快速的一体化 FASTQ 预处理程序。

fastp: an ultra-fast all-in-one FASTQ preprocessor.

机构信息

Department of Bioinformatics, HaploX Biotechnology, Shenzhen, China.

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

出版信息

Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.

Abstract

MOTIVATION

Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient.

RESULTS

We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.

AVAILABILITY AND IMPLEMENTATION

The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

摘要

动机

快速质量控制和预处理 FASTQ 文件对于为下游分析提供干净的数据至关重要。传统上,每个操作(如质量控制、接头修剪和质量过滤)都使用不同的工具。这些工具通常不够快,因为大多数都是使用高级编程语言(如 Python 和 Java)开发的,并且提供的多线程支持有限。多次读取和加载数据也使得预处理速度慢,I/O 效率低。

结果

我们开发了 fastp,这是一个超快的 FASTQ 预处理程序,具有有用的质量控制和数据过滤功能。它可以在单个 FASTQ 数据扫描中执行质量控制、接头修剪、质量过滤、每读质量修剪和许多其他操作。该工具是用 C++开发的,支持多线程。根据我们的评估,fastp 比其他 FASTQ 预处理工具(如 Trimmomatic 或 Cutadapt)快 2-5 倍,尽管它执行的操作远远超过类似的工具。

可用性和实现

开源代码和相应的说明可在 https://github.com/OpenGene/fastp 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b92/6129281/3536fba4922e/bty560f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验