Suppr超能文献

利用高通量测序数据增强R环预测

Enhancing R-loop prediction with high-throughput sequencing data.

作者信息

Vanhaeren Thomas, Cataneo Ludovica, Divina Federico, Martínez-García Pedro Manuel

机构信息

Division of Computer Science, Universidad Pablo de Olavide, 41013 Seville, Spain.

Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, via Belmeloro 6, 40126 Bologna, Italy.

出版信息

NAR Genom Bioinform. 2025 Jun 11;7(2):lqaf077. doi: 10.1093/nargab/lqaf077. eCollection 2025 Jun.

Abstract

R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.

摘要

R环是由RNA和DNA形成的三链杂交结构,常见于基因组中,在从细菌到哺乳动物的各种细胞过程中发挥重要作用。全基因组R环测序方法表明,它们可在细胞类型特异性基因的转录过程中形成,并在细胞分化和重编程过程中与特定的染色质状态相关联。然而,目前预测R环的计算方法仅依赖于其DNA序列特性,这使得无法在不同细胞类型、组织或发育阶段进行检测。在此,我们采用机器学习方法,利用序列信息和高通量测序信号预测哺乳动物细胞类型特异性R环。我们的预测模型由人类样本推导得出,具有高度准确的预测能力,转录组学、DNA特征、染色质可及性以及活性基因体H3K36me3表观基因组标记是最具信息的数据集。我们生成的虚拟R环图谱与实验图谱高度一致,并捕捉到了细胞类型特异性。我们的方法优于基于序列的方法,并且可以推广到小鼠数据集。基于此,我们在51个哺乳动物系统中生成了虚拟R环图谱,可供科学界免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e9c/12153340/36a828c7ef4e/lqaf077fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验