IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2072-2079. doi: 10.1109/TCBB.2021.3094650. Epub 2021 Dec 8.
Analyzing single-cell sequencing data from large cohorts is challenging. Discrepancies across experiments and differences among participants often lead to omissions and false discoveries in differentially expressed genes. We find that the Van Elteren test, a stratified version of the widely used Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. A comprehensive assessment using receiver operating characteristic (ROC) curve shows that Van Elteren test achieves higher sensitivity and specificity on simulated datasets, compared with nine state-of-the-art differential expression analysis methods. The effect size also estimates the differences between cell types more accurately.
分析大型队列的单细胞测序数据具有挑战性。实验之间的差异和参与者之间的差异常常导致差异表达基因中的遗漏和错误发现。我们发现,Van Elteren 检验是广泛使用的 Wilcoxon 秩和检验的分层版本,它巧妙地解决了这个问题。我们还修改了常见的语言效应量来补充这个检验,进一步提高了它的实用性。在模拟和真实的患者数据上,我们展示了 Van Elteren 检验控制假阳性和假阴性的能力。使用接收器操作特征(ROC)曲线的综合评估表明,与九种最先进的差异表达分析方法相比,Van Elteren 检验在模拟数据集上具有更高的灵敏度和特异性。效应量也能更准确地估计细胞类型之间的差异。