Suppr超能文献

单细胞 RNA 测序实验中的数据缺失和技术变异性。

Missing data and technical variability in single-cell RNA-sequencing experiments.

机构信息

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.

出版信息

Biostatistics. 2018 Oct 1;19(4):562-578. doi: 10.1093/biostatistics/kxx053.

Abstract

Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.

摘要

直到最近,高通量基因表达技术,如 RNA 测序(RNA-seq),需要数十万的细胞才能产生可靠的测量结果。最近的技术进步使得在单细胞水平上进行全基因组基因表达测量成为可能。单细胞 RNA-seq(scRNA-seq)是最广泛使用的技术,并且有许多出版物都是基于该技术产生的数据。然而,RNA-seq 和 scRNA-seq 数据有显著的不同。特别是,与 RNA-seq 不同的是,scRNA-seq 中大多数报告的表达水平都是零,这可能是由生物驱动的,即在测量时基因不表达 RNA,也可能是由技术驱动的,即基因表达 RNA,但测序技术检测不到足够的水平。另一个区别是,与 RNA-seq 样本相比,报告表达水平为零的基因在单细胞中的比例有很大的差异。然而,目前还不清楚这种细胞间的差异在多大程度上是由技术而不是生物变异驱动的。此外,虽然系统误差,包括批次效应,已被广泛报道为高通量技术的主要挑战,但在基于 scRNA-seq 技术的已发表研究中,这些问题几乎没有得到关注。在这里,我们使用评估实验来检查已发表研究的数据,并证明系统误差可以解释观察到的细胞间表达变异性的很大一部分。具体来说,我们通过证明 scRNA-seq 产生的零比预期的多,并且这种偏差在低表达基因中更大,证明了一些报告的零是由技术变异驱动的,从而提供了证据。此外,由于这种技术变异在细胞间存在差异,因此这个缺失数据问题更加严重。然后,我们展示了这种技术细胞间的可变性如何与新的生物学结果混淆。最后,我们展示并讨论了批次效应和混淆实验如何加剧这个问题。

相似文献

1
Missing data and technical variability in single-cell RNA-sequencing experiments.
Biostatistics. 2018 Oct 1;19(4):562-578. doi: 10.1093/biostatistics/kxx053.
3
Detection of high variability in gene expression from single-cell RNA-seq profiling.
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):508. doi: 10.1186/s12864-016-2897-6.
4
Normalization of Single-Cell RNA-Seq Data.
Methods Mol Biol. 2021;2284:303-329. doi: 10.1007/978-1-0716-1307-8_17.
5
Microfluidic single-cell whole-transcriptome sequencing.
Proc Natl Acad Sci U S A. 2014 May 13;111(19):7048-53. doi: 10.1073/pnas.1402030111. Epub 2014 Apr 29.
6
Analysis of Technical and Biological Variability in Single-Cell RNA Sequencing.
Methods Mol Biol. 2019;1935:25-43. doi: 10.1007/978-1-4939-9057-3_3.
8
SCnorm: robust normalization of single-cell RNA-seq data.
Nat Methods. 2017 Jun;14(6):584-586. doi: 10.1038/nmeth.4263. Epub 2017 Apr 17.
9
Quality Control of Single-Cell RNA-seq.
Methods Mol Biol. 2019;1935:1-9. doi: 10.1007/978-1-4939-9057-3_1.
10
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.
PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

引用本文的文献

1
A Benchmark of Semi-Supervised scRNA-seq Integration Methods in Real-World Scenarios.
bioRxiv. 2025 Aug 27:2025.08.23.671952. doi: 10.1101/2025.08.23.671952.
2
Missing data in single-cell transcriptomes reveals transcriptional shifts.
bioRxiv. 2025 Aug 21:2025.08.15.669765. doi: 10.1101/2025.08.15.669765.
4
BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models.
Patterns (N Y). 2025 Jul 30;6(8):101326. doi: 10.1016/j.patter.2025.101326. eCollection 2025 Aug 8.
5
RESCUE: recovery of idiosyncratic expression patterns in spatial transcriptomics.
bioRxiv. 2025 Aug 15:2025.08.11.669542. doi: 10.1101/2025.08.11.669542.
6
Biomaterial-mediated Cell Atlas: an insight from single-cell and spatial transcriptomics.
Bioact Mater. 2025 Aug 8;54:1-33. doi: 10.1016/j.bioactmat.2025.07.047. eCollection 2025 Dec.
7
Simulating paired and longitudinal single-cell RNA sequencing data with rescueSim.
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf442.
8
Discordant effects of maternal age on the human MII oocyte transcriptome.
Mol Hum Reprod. 2025 Jul 3;31(3). doi: 10.1093/molehr/gaaf038.
10
Critical gene network and signaling pathway analysis of the extracellular signal-regulated kinase (ERK) pathway in ischemic stroke.
Front Mol Neurosci. 2025 Jun 25;18:1604670. doi: 10.3389/fnmol.2025.1604670. eCollection 2025.

本文引用的文献

1
A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.
Ann Appl Stat. 2018 Mar;12(1):609-632. doi: 10.1214/17-AOAS1110. Epub 2018 Mar 9.
2
Normalizing single-cell RNA sequencing data: challenges and opportunities.
Nat Methods. 2017 Jun;14(6):565-571. doi: 10.1038/nmeth.4292. Epub 2017 May 15.
3
Power analysis of single-cell RNA-sequencing experiments.
Nat Methods. 2017 Apr;14(4):381-387. doi: 10.1038/nmeth.4220. Epub 2017 Mar 6.
4
Comparative Analysis of Single-Cell RNA Sequencing Methods.
Mol Cell. 2017 Feb 16;65(4):631-643.e4. doi: 10.1016/j.molcel.2017.01.023.
5
Massively parallel digital transcriptional profiling of single cells.
Nat Commun. 2017 Jan 16;8:14049. doi: 10.1038/ncomms14049.
6
Batch effects and the effective design of single-cell gene expression studies.
Sci Rep. 2017 Jan 3;7:39921. doi: 10.1038/srep39921.
7
The UCSC Genome Browser database: 2017 update.
Nucleic Acids Res. 2017 Jan 4;45(D1):D626-D634. doi: 10.1093/nar/gkw1134. Epub 2016 Nov 29.
8
9
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.
Nat Biotechnol. 2016 Dec;34(12):1287-1291. doi: 10.1038/nbt.3682. Epub 2016 Sep 26.
10
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.
Genome Biol. 2016 Apr 27;17:75. doi: 10.1186/s13059-016-0947-7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验