SenzaGen AB, Lund, Sweden.
Department of Immunotechnology, Lund University, Lund, Sweden.
PLoS One. 2019 Feb 22;14(2):e0212669. doi: 10.1371/journal.pone.0212669. eCollection 2019.
Many biological data acquisition platforms suffer from inadvertent inclusion of biologically irrelevant variance in analyzed data, collectively termed batch effects. Batch effects can lead to difficulties in downstream analysis by lowering the power to detect biologically interesting differences and can in certain instances lead to false discoveries. They are especially troublesome in predictive modelling where samples in training sets and test sets are often completely correlated with batches. In this article, we present BARA, a normalization method for adjusting batch effects in predictive modelling. BARA utilizes a few reference samples to adjust for batch effects in a compressed data space spanned by the training set. We evaluate BARA using a collection of publicly available datasets and three different prediction models, and compare its performance to already existing methods developed for similar purposes. The results show that data normalized with BARA generates high and consistent prediction performances. Further, they suggest that BARA produces reliable performances independent of the examined classifiers. We therefore conclude that BARA has great potential to facilitate the development of predictive assays where test sets and training sets are correlated with batch.
许多生物数据采集平台在分析数据时无意中包含了与生物学无关的差异,这些差异统称为批次效应。批次效应会降低检测生物学上有趣差异的能力,从而给下游分析带来困难,在某些情况下还会导致错误的发现。在预测建模中,它们尤其麻烦,因为训练集和测试集中的样本通常与批次完全相关。在本文中,我们提出了 BARA,这是一种用于调整预测建模中批次效应的归一化方法。BARA 利用几个参考样本,在由训练集跨越的压缩数据空间中调整批次效应。我们使用一组公开可用的数据集和三种不同的预测模型来评估 BARA,并将其性能与为类似目的开发的已有方法进行比较。结果表明,用 BARA 归一化的数据产生了高且一致的预测性能。此外,它们表明 BARA 产生的性能独立于所检查的分类器是可靠的。因此,我们得出结论,BARA 具有很大的潜力,可促进具有相关性的测试集和训练集的预测分析的发展。