Suppr超能文献

使用合成RNA对直接RNA测序中的聚腺苷酸(poly(A))长度推断进行基准测试。

Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing.

作者信息

Chang Jessie J-Y, Yang Xuan, Teng Haotian, Zhang Jianshu, Reames Benjamin, Zhang Shuxin, Corbin Vincent, Coin Lachlan J M

机构信息

Department of Microbiology and Immunology, University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, 3000, Australia.

Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States.

出版信息

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf098.

Abstract

Polyadenylation is a dynamic process that is important in cellular physiology, which has implications in messenger RNA decay rates, translation efficiency, and isoform-specific regulation. Oxford Nanopore Technologies direct RNA sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome. Several tools are currently available for poly(A) tail length estimation, including well-established methods like tailfindr and nanopolish, as well as more recent deep learning models like Dorado. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this article, we present our novel deep learning poly(A) estimation tool-BoostNano-and compare with 3 existing tools-tailfindr, nanopolish, and Dorado. We evaluate the 4 poly(A) estimation tools, using 2 sets of synthetic in vitro transcribed RNA standards with known poly(A) tail lengths-Sequin (30 or 60 nucleotides) and enhanced green fluorescent protein (10-150 nucleotides) RNA. Analyzing datasets with known ground-truth values is a valuable approach to measuring the accuracy of poly(A) length estimation. The tools demonstrated length- and sample-dependent performance, and accuracy was enhanced by averaging over multiple reads via estimation of the peak of the density distribution. Overall, Dorado is recommended as the preferred approach due to its relatively fast runtimes, low mean error, and ease of use with integration with base-calling. These results provide a reference for poly(A) tail length estimation analysis, aiding in improving our understanding of the transcriptome and the relationship between poly(A) tail length and other transcriptional mechanisms, including transcript stability or quantification.

摘要

聚腺苷酸化是一个在细胞生理学中很重要的动态过程,它对信使RNA的衰减速率、翻译效率和异构体特异性调控都有影响。牛津纳米孔技术公司的直接RNA测序为全长RNA分子测序和转录组分析提供了一种策略。目前有几种工具可用于估计poly(A)尾长度,包括成熟的方法如tailfindr和nanopolish,以及较新的深度学习模型如Dorado。然而,针对金标准数据集对这些工具的准确性进行的基准测试有限。在本文中,我们展示了我们新的深度学习poly(A)估计工具——BoostNano,并与3种现有工具——tailfindr、nanopolish和Dorado进行了比较。我们使用两组已知poly(A)尾长度的体外转录RNA标准品——Sequin(30或60个核苷酸)和增强型绿色荧光蛋白(10 - 150个核苷酸)RNA,评估了这4种poly(A)估计工具。分析具有已知真实值的数据集是衡量poly(A)长度估计准确性的一种有价值的方法。这些工具表现出长度和样本依赖性性能,并且通过估计密度分布的峰值对多个读数进行平均来提高准确性。总体而言,由于Dorado运行时间相对较快、平均误差较低且易于与碱基识别集成使用,因此被推荐为首选方法。这些结果为poly(A)尾长度估计分析提供了参考,有助于增进我们对转录组以及poly(A)尾长度与其他转录机制(包括转录稳定性或定量)之间关系的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa5a/12406214/390627ba8df0/giaf098fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验