Suppr超能文献

使用TWILIGHT进行超快速和超大的多序列比对。

Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.

作者信息

Tseng Yu-Hsiang, Walia Sumit, Turakhia Yatish

机构信息

Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, United States.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.

Abstract

MOTIVATION

Multiple sequence alignment (MSA) is a fundamental operation in bioinformatics, yet existing MSA tools are struggling to keep up with the speed and volume of incoming data. This is because the runtimes and memory requirements of current MSA tools become untenable when processing large numbers of long input sequences, and they also fail to fully harness the parallelism provided by modern CPUs and GPUs.

RESULTS

We present Tall and Wide Alignments at High Throughput (TWILIGHT), a novel MSA tool optimized for speed, accuracy, scalability, and memory constraints, with both CPU and GPU support. TWILIGHT incorporates innovative parallelization and memory-efficiency strategies that enable it to build ultralarge alignments at high speed even on memory-constrained devices. On challenging datasets, TWILIGHT outperformed all other tools in speed and accuracy. It scaled beyond the limits of existing tools and performed an alignment of 1 million RNASim sequences within 30 min while utilizing <16 GB of memory. TWILIGHT is the first tool to align over 8 million publicly available SARS-CoV-2 sequences, setting a new standard for large-scale genomic alignment and data analysis.

AVAILABILITY AND IMPLEMENTATION

TWILIGHT's code is freely available under the MIT license at https://github.com/TurakhiaLab/TWILIGHT. The test datasets and experimental results, including our alignment of 8 million SARS-CoV-2 sequences, are available at https://zenodo.org/records/14722035.

摘要

动机

多序列比对(MSA)是生物信息学中的一项基本操作,但现有的MSA工具难以跟上输入数据的速度和数量。这是因为当前MSA工具的运行时间和内存需求在处理大量长输入序列时变得难以维持,而且它们也未能充分利用现代CPU和GPU提供的并行性。

结果

我们展示了高通量下的高宽比对(TWILIGHT),这是一种针对速度、准确性、可扩展性和内存限制进行优化的新型MSA工具,同时支持CPU和GPU。TWILIGHT采用了创新的并行化和内存效率策略,使其即使在内存受限的设备上也能高速构建超大型比对。在具有挑战性的数据集上,TWILIGHT在速度和准确性方面优于所有其他工具。它突破了现有工具的限制,在30分钟内利用不到16GB的内存对100万个RNASim序列进行了比对。TWILIGHT是第一个比对超过800万个公开可用的SARS-CoV-2序列的工具,为大规模基因组比对和数据分析树立了新的标准。

可用性和实现方式

TWILIGHT的代码在MIT许可下可在https://github.com/TurakhiaLab/TWILIGHT上免费获取。测试数据集和实验结果,包括我们对800万个SARS-CoV-2序列的比对,可在https://zenodo.org/records/14722035上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3def/12261412/524f8fc6ed6f/btaf212f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验