Suppr超能文献

基于 Tweedie 模型的单细胞 RNA-seq 数据差异表达分析。

Differential expression of single-cell RNA-seq data using Tweedie models.

机构信息

Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, Rahway, New Jersey, USA.

Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

Stat Med. 2022 Aug 15;41(18):3492-3510. doi: 10.1002/sim.9430. Epub 2022 Jun 2.

Abstract

The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.

摘要

计算方法和软件在单细胞 RNA 测序(scRNA-seq)中识别差异表达特征的性能已被证明受到多种因素的影响,包括所使用的归一化方法的选择以及用于在单个细胞中分析基因表达的实验平台(或文库制备方案)的选择。目前,实践人员可以从目前为止可用的 100 多种差异表达(DE)工具中选择最合适的 DE 方法,每种方法都依赖于自己的假设来对 scRNA-seq 表达特征进行建模。为了对跨平台 scRNA-seq 数据中的技术变异性进行建模,我们在这里建议使用 Tweedie 广义线性模型,该模型可以灵活地捕获跨实验平台的观察到的 scRNA-seq 表达谱的大范围动态范围,这些表达谱由平台和基因特异性统计特性(如重尾、稀疏性和基因表达分布)引起。我们还提出了一个零膨胀 Tweedie 模型,允许零概率质量超过传统的 Tweedie 分布,以对具有过多零计数的零膨胀 scRNA-seq 数据进行建模。我们使用合成和已发表的基于板和基于液滴的 scRNA-seq 数据集,对 10 多种代表性的 DE 方法进行了系统的基准评估,并证明我们的方法(Tweedieverse)在统计功效和假发现率控制方面优于跨实验平台的最先进的 DE 方法。我们的开源软件(R / Bioconductor 包)可在 https://github.com/himelmallick/Tweedieverse 上获得。

相似文献

1
Differential expression of single-cell RNA-seq data using Tweedie models.
Stat Med. 2022 Aug 15;41(18):3492-3510. doi: 10.1002/sim.9430. Epub 2022 Jun 2.
2
SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data.
Genomics. 2021 May;113(3):1308-1324. doi: 10.1016/j.ygeno.2021.02.014. Epub 2021 Mar 1.
4
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.
Genome Biol. 2018 Feb 26;19(1):24. doi: 10.1186/s13059-018-1406-4.
6
DECENT: differential expression with capture efficiency adjustmeNT for single-cell RNA-seq data.
Bioinformatics. 2019 Dec 15;35(24):5155-5162. doi: 10.1093/bioinformatics/btz453.
7
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
8
TWO-SIGMA: A novel two-component single cell model-based association method for single-cell RNA-seq data.
Genet Epidemiol. 2021 Mar;45(2):142-153. doi: 10.1002/gepi.22361. Epub 2020 Sep 29.
9
scAMZI: attention-based deep autoencoder with zero-inflated layer for clustering scRNA-seq data.
BMC Genomics. 2025 Apr 7;26(1):350. doi: 10.1186/s12864-025-11511-2.
10
ZIAQ: a quantile regression method for differential expression analysis of single-cell RNA-seq data.
Bioinformatics. 2020 May 1;36(10):3124-3130. doi: 10.1093/bioinformatics/btaa098.

引用本文的文献

2
Fibroblast-Mediated Macrophage Recruitment Supports Acute Wound Healing.
J Invest Dermatol. 2024 Nov 22. doi: 10.1016/j.jid.2024.10.609.
3
Deep skin fibroblast-mediated macrophage recruitment supports acute wound healing.
bioRxiv. 2024 Aug 10:2024.08.09.607357. doi: 10.1101/2024.08.09.607357.
5
eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings.
BMC Bioinformatics. 2024 Mar 15;25(1):113. doi: 10.1186/s12859-024-05724-7.
6
Hospital antimicrobial stewardship: profiling the oral microbiome after exposure to COVID-19 and antibiotics.
Front Microbiol. 2024 Feb 27;15:1346762. doi: 10.3389/fmicb.2024.1346762. eCollection 2024.
7
Dietary resistant starch supplementation increases gut luminal deoxycholic acid abundance in mice.
Gut Microbes. 2024 Jan-Dec;16(1):2315632. doi: 10.1080/19490976.2024.2315632. Epub 2024 Feb 20.
10
Construction and validation of a prognostic signature based on necroptosis-related genes in hepatocellular carcinoma.
PLoS One. 2023 Feb 16;18(2):e0279744. doi: 10.1371/journal.pone.0279744. eCollection 2023.

本文引用的文献

1
Evidence for oligodendrocyte progenitor cell heterogeneity in the adult mouse brain.
Sci Rep. 2022 Jul 28;12(1):12921. doi: 10.1038/s41598-022-17081-7.
2
IDEAS: individual level differential expression analysis for single-cell RNA-seq data.
Genome Biol. 2022 Jan 24;23(1):33. doi: 10.1186/s13059-022-02605-1.
3
SCRIP: an accurate simulator for single-cell RNA sequencing data.
Bioinformatics. 2022 Feb 7;38(5):1304-1311. doi: 10.1093/bioinformatics/btab824.
4
Multivariable association discovery in population-scale meta-omics studies.
PLoS Comput Biol. 2021 Nov 16;17(11):e1009442. doi: 10.1371/journal.pcbi.1009442. eCollection 2021 Nov.
5
Statistical approaches for differential expression analysis in metatranscriptomics.
Bioinformatics. 2021 Jul 12;37(Suppl_1):i34-i41. doi: 10.1093/bioinformatics/btab327.
6
Bayesian modeling of spatial molecular profiling data via Gaussian process.
Bioinformatics. 2021 Nov 18;37(22):4129-4136. doi: 10.1093/bioinformatics/btab455.
7
Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis.
Nat Genet. 2021 Jun;53(6):770-777. doi: 10.1038/s41588-021-00873-4. Epub 2021 May 24.
8
UMI or not UMI, that is the question for scRNA-seq zero-inflation.
Nat Biotechnol. 2021 Feb;39(2):158-159. doi: 10.1038/s41587-020-00810-6. Epub 2021 Feb 1.
9
TWO-SIGMA: A novel two-component single cell model-based association method for single-cell RNA-seq data.
Genet Epidemiol. 2021 Mar;45(2):142-153. doi: 10.1002/gepi.22361. Epub 2020 Sep 29.
10
Sequence count data are poorly fit by the negative binomial distribution.
PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验