Suppr超能文献

基于误差模型的基因表达分析(EMOGEA)提供了对时间序列 RNA-seq 测量和低计数基因表达的更全面的概述。

Error modelled gene expression analysis (EMOGEA) provides a superior overview of time course RNA-seq measurements and low count gene expression.

机构信息

Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada.

Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae233.

Abstract

Temporal RNA-sequencing (RNA-seq) studies of bulk samples provide an opportunity for improved understanding of gene regulation during dynamic phenomena such as development, tumor progression or response to an incremental dose of a pharmacotherapeutic. Moreover, single-cell RNA-seq (scRNA-seq) data implicitly exhibit temporal characteristics because gene expression values recapitulate dynamic processes such as cellular transitions. Unfortunately, temporal RNA-seq data continue to be analyzed by methods that ignore this ordinal structure and yield results that are often difficult to interpret. Here, we present Error Modelled Gene Expression Analysis (EMOGEA), a framework for analyzing RNA-seq data that incorporates measurement uncertainty, while introducing a special formulation for those acquired to monitor dynamic phenomena. This method is specifically suited for RNA-seq studies in which low-count transcripts with small-fold changes lead to significant biological effects. Such transcripts include genes involved in signaling and non-coding RNAs that inherently exhibit low levels of expression. Using simulation studies, we show that this framework down-weights samples that exhibit extreme responses such as batch effects allowing them to be modeled with the rest of the samples and maintain the degrees of freedom originally envisioned for a study. Using temporal experimental data, we demonstrate the framework by extracting a cascade of gene expression waves from a well-designed RNA-seq study of zebrafish embryogenesis and an scRNA-seq study of mouse pre-implantation and provide unique biological insights into the regulation of genes in each wave. For non-ordinal measurements, we show that EMOGEA has a much higher rate of true positive calls and a vanishingly small rate of false negative discoveries compared to common approaches. Finally, we provide two packages in Python and R that are self-contained and easy to use, including test data.

摘要

批量样本的时间 RNA 测序(RNA-seq)研究为深入了解发育、肿瘤进展或对递增剂量的药物治疗的反应等动态现象中的基因调控提供了机会。此外,单细胞 RNA-seq(scRNA-seq)数据隐含地表现出时间特征,因为基因表达值再现了细胞转化等动态过程。不幸的是,时间 RNA-seq 数据继续通过忽略这种顺序结构的方法进行分析,得出的结果往往难以解释。在这里,我们提出了 Error Modelled Gene Expression Analysis(EMOGEA),这是一种分析 RNA-seq 数据的框架,它包含了测量不确定性,同时为那些用于监测动态现象的测量方法引入了一种特殊的公式。这种方法特别适合于 RNA-seq 研究,其中低计数转录物的小倍数变化会导致显著的生物学效应。此类转录物包括参与信号转导和非编码 RNA 的基因,它们本身表达水平较低。通过模拟研究,我们表明该框架可以降低表现出极端反应(例如批次效应)的样本的权重,使它们可以与其余样本一起建模,并保持研究中最初设想的自由度。使用时间实验数据,我们从斑马鱼胚胎发生的精心设计的 RNA-seq 研究和小鼠植入前的 scRNA-seq 研究中提取了一系列基因表达波,展示了该框架,并提供了对每个波中基因调控的独特生物学见解。对于非顺序测量,我们表明与常见方法相比,EMOGEA 具有更高的真阳性调用率和几乎为零的假阴性发现率。最后,我们提供了两个 Python 和 R 包,它们是自包含的,易于使用,包括测试数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d960/11106635/31c6113bdf25/bbae233ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验