Suppr超能文献

通过深度学习快速准确地识别核糖体 RNA 序列。

Rapid and accurate identification of ribosomal RNA sequences via deep learning.

机构信息

Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany.

Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.

出版信息

Nucleic Acids Res. 2022 Jun 10;50(10):e60. doi: 10.1093/nar/gkac112.

Abstract

Advances in transcriptomic and translatomic techniques enable in-depth studies of RNA activity profiles and RNA-based regulatory mechanisms. Ribosomal RNA (rRNA) sequences are highly abundant among cellular RNA, but if the target sequences do not include polyadenylation, these cannot be easily removed in library preparation, requiring their post-hoc removal with computational techniques to accelerate and improve downstream analyses. Here, we describe RiboDetector, a novel software based on a Bi-directional Long Short-Term Memory (BiLSTM) neural network, which rapidly and accurately identifies rRNA reads from transcriptomic, metagenomic, metatranscriptomic, noncoding RNA, and ribosome profiling sequence data. Compared with state-of-the-art approaches, RiboDetector produced at least six times fewer misclassifications on the benchmark datasets. Importantly, the few false positives of RiboDetector were not enriched in certain Gene Ontology (GO) terms, suggesting a low bias for downstream functional profiling. RiboDetector also demonstrated a remarkable generalizability for detecting novel rRNA sequences that are divergent from the training data with sequence identities of <90%. On a personal computer, RiboDetector processed 40M reads in less than 6 min, which was ∼50 times faster in GPU mode and ∼15 times in CPU mode than other methods. RiboDetector is available under a GPL v3.0 license at https://github.com/hzi-bifo/RiboDetector.

摘要

转录组学和转位学技术的进步使人们能够深入研究 RNA 的活性谱和基于 RNA 的调控机制。核糖体 RNA(rRNA)序列在细胞 RNA 中高度丰富,但如果目标序列不包括多聚腺苷酸化,那么在文库制备过程中就不容易去除这些序列,需要使用计算技术进行后处理去除,以加速和改进下游分析。在这里,我们描述了 RiboDetector,这是一种基于双向长短期记忆(BiLSTM)神经网络的新型软件,可以从转录组学、宏基因组学、宏转录组学、非编码 RNA 和核糖体分析序列数据中快速准确地识别 rRNA 读取。与最先进的方法相比,RiboDetector 在基准数据集上产生的错误分类至少少了六倍。重要的是,RiboDetector 的少数假阳性并没有富集在某些基因本体(GO)术语中,这表明它对下游功能分析的偏差较小。RiboDetector 还展示了检测与训练数据序列同一性<90%的新型 rRNA 序列的显著通用性。在个人计算机上,RiboDetector 在不到 6 分钟的时间内处理了 4000 万条读取,在 GPU 模式下比其他方法快约 50 倍,在 CPU 模式下快约 15 倍。RiboDetector 可在 GPL v3.0 许可证下在 https://github.com/hzi-bifo/RiboDetector 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c033/9177968/c868f6857a54/gkac112fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验