Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA.
Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA.
Nat Commun. 2021 Feb 2;12(1):738. doi: 10.1038/s41467-021-21038-1.
Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation. Here, we document this dependence across a range of cell types and show that pseudo-bulk aggregation methods are conservative and underpowered relative to mixed models. To compute differential expression within a specific cell type across treatment groups, we propose applying generalized linear mixed models with a random effect for individual, to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies.
来自同一个体的细胞具有共同的遗传和环境背景,在统计上不独立;因此,它们是亚样本或伪重复。因此,单细胞数据具有许多当前的单细胞方法未解决的层次结构,导致有偏差的推断、高度膨胀的第一类错误率以及降低的稳健性和可重复性。这包括使用批量效应校正个体作为解释样本内相关性的方法。在这里,我们在一系列细胞类型中记录了这种依赖性,并表明伪批量聚合方法相对于混合模型保守且功效不足。为了在特定细胞类型中计算处理组之间的差异表达,我们建议对个体应用具有随机效应的广义线性混合模型,以正确考虑个体内细胞之间的测量值的零膨胀和相关结构。最后,我们提供了一系列实验条件下的功效估计,以帮助研究人员设计适当的有能力的研究。