Suppr超能文献

可移植的 BLAST 样算法库及其在命令行、Python 和 R 中的实现。

Portable BLAST-like algorithm library and its implementations for command line, Python, and R.

机构信息

hakuna AG, Zürich, Switzerland.

Department of Biology, University of Turku, Turku, Finland.

出版信息

PLoS One. 2023 Nov 30;18(11):e0289693. doi: 10.1371/journal.pone.0289693. eCollection 2023.

Abstract

Basic local-alignment search tool (BLAST) is a versatile and commonly used sequence analysis tool in bioinformatics. BLAST permits fast and flexible sequence similarity searches across nucleotide and amino acid sequences, leading to diverse applications such as protein domain identification, orthology searches, and phylogenetic annotation. Most BLAST implementations are command line tools which produce output as comma-separated values files. However, a portable, modular and embeddable implementation of a BLAST-like algorithm, is still missing from our toolbox. Here we present nsearch, a command line tool and C++11 library which provides BLAST-like functionality that can easily be embedded in any application. As an example of this portability we present Blaster which leverages nsearch to provide native BLAST-like functionality for the R programming language, as well as npysearch which provides similar functionality for Python. These packages permit embedding BLAST-like functionality into larger frameworks such as Shiny or Django applications. Benchmarks show that nsearch, npysearch, and Blaster are comparable in speed and accuracy to other commonly used modern BLAST implementations such as VSEARCH and BLAST+. We envision similar implementations of nsearch for other languages commonly used in data science such as Julia to facilitate sequence similarity comparisons. Nsearch, Blaster and npysearch are free to use under the BSD 3.0 license and available on Github Conda, CRAN (Blaster) and PyPi (npysearch).

摘要

基本局部比对搜索工具(BLAST)是生物信息学中一种通用且常用的序列分析工具。BLAST 允许在核苷酸和氨基酸序列上进行快速灵活的序列相似性搜索,从而实现了多种应用,如蛋白质结构域识别、同源搜索和系统发育注释。大多数 BLAST 实现都是命令行工具,其输出为逗号分隔值文件。然而,我们的工具包中仍然缺少类似 BLAST 的算法的可移植、模块化和可嵌入的实现。

这里我们介绍 nsearch,这是一个命令行工具和 C++11 库,提供了类似 BLAST 的功能,可以轻松嵌入到任何应用程序中。作为这种可移植性的一个示例,我们介绍了 Blaster,它利用 nsearch 为 R 编程语言提供了本地 BLAST 功能,以及 npysearch,它为 Python 提供了类似的功能。这些包允许将类似 BLAST 的功能嵌入到更大的框架中,如 Shiny 或 Django 应用程序。

基准测试表明,nsearch、npysearch 和 Blaster 在速度和准确性方面与其他常用的现代 BLAST 实现(如 VSEARCH 和 BLAST+)相当。我们设想为数据科学中常用的其他语言(如 Julia)实现类似的 nsearch,以促进序列相似性比较。nsearch、Blaster 和 npysearch 在 BSD 3.0 许可证下免费使用,并可在 Github Conda、CRAN(Blaster)和 PyPi(npysearch)上获得。

相似文献

1
Portable BLAST-like algorithm library and its implementations for command line, Python, and R.
PLoS One. 2023 Nov 30;18(11):e0289693. doi: 10.1371/journal.pone.0289693. eCollection 2023.
2
BLAST+: architecture and applications.
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
3
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.
4
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).
BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93.
5
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
6
blastjs: a BLAST+ wrapper for Node.js.
BMC Res Notes. 2016 Feb 27;9:130. doi: 10.1186/s13104-016-1938-1.
7
A comparison of common programming languages used in bioinformatics.
BMC Bioinformatics. 2008 Feb 5;9:82. doi: 10.1186/1471-2105-9-82.
8
NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.
Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.
9
An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators.
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S11. doi: 10.1186/1471-2105-11-S12-S11.
10
BlastGUI: A Python-based Cross-platform Local BLAST Visualization Software.
Mol Inform. 2020 Apr;39(4):e1900120. doi: 10.1002/minf.201900120. Epub 2019 Nov 5.

本文引用的文献

2
Non-Coding RNA Analysis Using the Rfam Database.
Curr Protoc Bioinformatics. 2018 Jun;62(1):e51. doi: 10.1002/cpbi.51. Epub 2018 Jun 5.
3
VSEARCH: a versatile open source tool for metagenomics.
PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. eCollection 2016.
4
Search and clustering orders of magnitude faster than BLAST.
Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12.
5
BLAST+: architecture and applications.
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
6
NCBI BLAST: a better web interface.
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9. doi: 10.1093/nar/gkn201. Epub 2008 Apr 24.
7
BLASTO: a tool for searching orthologous groups.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W678-82. doi: 10.1093/nar/gkm278. Epub 2007 May 5.
8
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
9
BLAST: at the core of a powerful and diverse set of sequence analysis tools.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W20-5. doi: 10.1093/nar/gkh435.
10
Local homology recognition and distance measures in linear time using compressed amino acid alphabets.
Nucleic Acids Res. 2004 Jan 16;32(1):380-5. doi: 10.1093/nar/gkh180. Print 2004.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验