Suppr超能文献

一种用于细胞类型无关调节预测的多模态变压器。

A multi-modal transformer for cell type-agnostic regulatory predictions.

作者信息

Javed Nauman, Weingarten Thomas, Sehanobish Arijit, Roberts Adam, Dubey Avinava, Choromanski Krzysztof, Bernstein Bradley E

机构信息

The Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

Google, Mountain View, CA 94043, USA.

出版信息

Cell Genom. 2025 Feb 12;5(2):100762. doi: 10.1016/j.xgen.2025.100762. Epub 2025 Jan 29.

Abstract

Sequence-based deep learning models have emerged as powerful tools for deciphering the cis-regulatory grammar of the human genome but cannot generalize to unobserved cellular contexts. Here, we present EpiBERT, a multi-modal transformer that learns generalizable representations of genomic sequence and cell type-specific chromatin accessibility through a masked accessibility-based pre-training objective. Following pre-training, EpiBERT can be fine-tuned for gene expression prediction, achieving accuracy comparable to the sequence-only Enformer model, while also being able to generalize to unobserved cell states. The learned representations are interpretable and useful for predicting chromatin accessibility quantitative trait loci (caQTLs), regulatory motifs, and enhancer-gene links. Our work represents a step toward improving the generalization of sequence-based deep neural networks in regulatory genomics.

摘要

基于序列的深度学习模型已成为破译人类基因组顺式调控语法的强大工具,但无法推广到未观察到的细胞环境中。在此,我们展示了EpiBERT,这是一种多模态变换器,它通过基于掩码可及性的预训练目标来学习基因组序列和细胞类型特异性染色质可及性的可推广表示。预训练后,EpiBERT可针对基因表达预测进行微调,实现与仅基于序列的Enformer模型相当的准确性,同时还能够推广到未观察到的细胞状态。所学习的表示是可解释的,并且有助于预测染色质可及性数量性状位点(caQTL)、调控基序和增强子-基因联系。我们的工作朝着提高基于序列的深度神经网络在调控基因组学中的泛化能力迈出了一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfd/11872434/6ca55eba5acc/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验