Suppr超能文献

探究批量效应校正算法的实际限制:何时应关注批量效应?

Examining the practical limits of batch effect-correction algorithms: When should you care about batch effects?

机构信息

School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, 30072, China.

School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore.

出版信息

J Genet Genomics. 2019 Sep 20;46(9):433-443. doi: 10.1016/j.jgg.2019.08.002.

Abstract

Batch effects are technical sources of variation and can confound analysis. While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm (BECA), we hold the viewpoint that the notion of best is context-dependent. Moreover, alternative questions beyond the simplistic notion of "best" are also interesting: are BECAs robust against various degrees of confounding and if so, what is the limit? Using two different methods for simulating class (phenotype) and batch effects and taking various representative datasets across both genomics (RNA-Seq) and proteomics platforms, we demonstrate that under situations where sample classes and batch factors are moderately confounded, most BECAs are remarkably robust and only weakly affected by upstream normalization procedures. This observation is consistently supported across the multitude of test datasets. BECAs do have limits: When sample classes and batch factors are strongly confounded, BECA performance declines, with variable performance in precision, recall and also batch correction. We also report that while conventional normalization methods have minimal impact on batch effect correction, they do not affect downstream statistical feature selection, and in strongly confounded scenarios, may even outperform BECAs. In other words, removing batch effects is no guarantee of optimal functional analysis. Overall, this study suggests that simplistic performance ranking exercises are quite trivial, and all BECAs are compromises in some context or another.

摘要

批次效应是技术变异源,可能会干扰分析。虽然已经进行了许多性能排名练习来确定最佳批次效应校正算法(BECA),但我们认为最佳的概念是依赖于上下文的。此外,超越简单最佳概念的替代问题也很有趣:BECAs 是否能抵抗各种程度的混杂,如果可以,其极限是什么?我们使用两种不同的方法模拟类别(表型)和批次效应,并采用基因组学(RNA-Seq)和蛋白质组学平台的各种代表性数据集,证明在样本类别和批次因素中度混杂的情况下,大多数 BECA 非常稳健,仅受到上游标准化程序的微弱影响。这一观察结果在众多测试数据集中得到一致支持。BECA 确实存在局限性:当样本类别和批次因素强烈混杂时,BECA 的性能会下降,精度、召回率以及批次校正的性能也会下降。我们还报告说,虽然传统的标准化方法对批次效应校正的影响很小,但它们不会影响下游的统计特征选择,并且在强烈混杂的情况下,甚至可能优于 BECA。换句话说,去除批次效应并不能保证最佳的功能分析。总的来说,这项研究表明,简单的性能排名练习是相当琐碎的,而且所有的 BECA 在某种程度上都是一种妥协。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验