单细胞RNA测序（scRNA-seq）统计误差模型的比较与评估

Comparison and evaluation of statistical error models for scRNA-seq.

作者信息

Choudhary Saket, Satija Rahul

机构信息

New York Genome Center, 101 Avenue of the Americas, New York, 100013, USA.

Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003, USA.

出版信息

Genome Biol. 2022 Jan 18;23(1):27. doi: 10.1186/s13059-021-02584-9.

DOI:10.1186/s13059-021-02584-9

PMID:35042561

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8764781/

Abstract

BACKGROUND

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate.

RESULTS

Here, we analyze 59 scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation.

CONCLUSIONS

Based on these analyses, we provide a set of recommendations for modeling variation in scRNA-seq data, particularly when using generalized linear models or likelihood-based approaches for preprocessing and downstream analysis.

摘要

背景

单细胞RNA测序（scRNA-seq）数据中的异质性由多种来源驱动，包括细胞状态的生物学变异以及实验处理过程中引入的技术变异。对这些影响进行解卷积是预处理工作流程的关键挑战。最近的工作已经证明了计数模型在scRNA-seq分析中的重要性和实用性，但对于哪种统计分布和参数设置合适，目前尚无共识。

结果

在这里，我们分析了59个scRNA-seq数据集，这些数据集涵盖了广泛的技术、系统和测序深度，以评估不同误差模型的性能。我们发现，虽然泊松误差模型似乎适用于稀疏数据集，但在所有生物系统中，我们观察到有足够测序深度的基因存在明显的过度离散证据，因此需要使用负二项式模型。此外，我们发现过度离散的程度在不同数据集、系统和基因丰度之间差异很大，这表明需要一种数据驱动的参数估计方法。

结论

基于这些分析，我们为scRNA-seq数据变异建模提供了一组建议，特别是在使用广义线性模型或基于似然的方法进行预处理和下游分析时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4142/8764781/0218d1f757de/13059_2021_2584_Fig1_HTML.jpg

相似文献

Comparison and evaluation of statistical error models for scRNA-seq.

Genome Biol. 2022 Jan 18;23(1):27. doi: 10.1186/s13059-021-02584-9.

Benchmarking UMI-based single-cell RNA-seq preprocessing workflows.

Genome Biol. 2021 Dec 14;22(1):339. doi: 10.1186/s13059-021-02552-3.

Detection of high variability in gene expression from single-cell RNA-seq profiling.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):508. doi: 10.1186/s12864-016-2897-6.

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.

Genome Biol. 2019 Dec 23;20(1):296. doi: 10.1186/s13059-019-1874-1.

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies.

Genes (Basel). 2021 Dec 2;12(12):1947. doi: 10.3390/genes12121947.

scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.

BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.

Genome Biol. 2021 Sep 6;22(1):258. doi: 10.1186/s13059-021-02451-7.

A statistical simulator scDesign for rational scRNA-seq experimental design.

Bioinformatics. 2019 Jul 15;35(14):i41-i50. doi: 10.1093/bioinformatics/btz321.

Bayesian gamma-negative binomial modeling of single-cell RNA sequencing data.

BMC Genomics. 2020 Sep 9;21(Suppl 9):585. doi: 10.1186/s12864-020-06938-8.

Missing data and technical variability in single-cell RNA-sequencing experiments.

Biostatistics. 2018 Oct 1;19(4):562-578. doi: 10.1093/biostatistics/kxx053.

引用本文的文献

Inflammatory, Functional, and Compositional Changes of the Uterine Immune Microenvironment in a Lymphangioleiomyomatosis Mouse Model.

J Cell Immunol. 2025;7(3):74-97. doi: 10.33696/immunology.7.227.

Transcriptome-wide root causal inference.

PLoS Comput Biol. 2025 Sep 2;21(9):e1013461. doi: 10.1371/journal.pcbi.1013461. eCollection 2025 Sep.

MEF2C controls segment-specific gene regulatory networks that direct heart tube morphogenesis.

Genes Dev. 2025 Aug 29. doi: 10.1101/gad.352889.125.

Plasma exosomes from individuals with type 2 diabetes drive breast cancer aggression in patient-derived organoids.

Commun Biol. 2025 Aug 26;8(1):1276. doi: 10.1038/s42003-025-08663-y.

A spatial single-cell atlas of the claustro-insular region uncovers key regulators of neuronal identity and excitability.

Nat Commun. 2025 Aug 22;16(1):7830. doi: 10.1038/s41467-025-63138-2.

A single-cell, spatial transcriptomic atlas of the Arabidopsis life cycle.

Nat Plants. 2025 Aug 19. doi: 10.1038/s41477-025-02072-z.

Establishment of chromatin architecture interplays with embryo hypertranscription.

Nature. 2025 Aug 13. doi: 10.1038/s41586-025-09400-5.

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxaf024.

Adrenergic signaling coordinates distant and local responses to amputation in axolotl.

bioRxiv. 2025 Jul 24:2021.12.29.474455. doi: 10.1101/2021.12.29.474455.

Focal Adhesion Kinase Drives Rho/ROCK and mTOR Signaling to Protect and Augment Aortic Dissections.

JACC Basic Transl Sci. 2025 Aug 2;10(9):101353. doi: 10.1016/j.jacbts.2025.101353.

本文引用的文献

PsiNorm: a scalable normalization for single-cell RNA-seq data.

Bioinformatics. 2021 Dec 22;38(1):164-172. doi: 10.1093/bioinformatics/btab641.

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data.

Genome Biol. 2021 Sep 6;22(1):258. doi: 10.1186/s13059-021-02451-7.

Molecular architecture of the developing mouse brain.

Nature. 2021 Aug;596(7870):92-96. doi: 10.1038/s41586-021-03775-x. Epub 2021 Jul 28.

Integrated analysis of multimodal single-cell data.

Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.

NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data.

Commun Biol. 2021 May 26;4(1):629. doi: 10.1038/s42003-021-02146-6.

Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis.

Nat Genet. 2021 Jun;53(6):770-777. doi: 10.1038/s41588-021-00873-4. Epub 2021 May 24.

Single-cell CUT&Tag analysis of chromatin modifications in differentiation and tumor progression.

Nat Biotechnol. 2021 Jul;39(7):819-824. doi: 10.1038/s41587-021-00865-z. Epub 2021 Apr 12.

Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues.

Nat Biotechnol. 2021 Jul;39(7):825-835. doi: 10.1038/s41587-021-00869-9. Epub 2021 Apr 12.

Cellular transcriptomics reveals evolutionary identities of songbird vocal circuits.

Science. 2021 Feb 12;371(6530). doi: 10.1126/science.abd9704.

glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data.

Bioinformatics. 2021 Apr 5;36(24):5701-5702. doi: 10.1093/bioinformatics/btaa1009.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

单细胞RNA测序（scRNA-seq）统计误差模型的比较与评估

Comparison and evaluation of statistical error models for scRNA-seq.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献