Berglund Anders E, Welsh Eric A, Eschrich Steven A
Department of Biostatistics and Bioinformatics, Division of Population Sciences, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
Int J Genomics. 2017;2017:2354564. doi: 10.1155/2017/2354564. Epub 2017 Feb 6.
. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. . This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance.
the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. . The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified.
存在许多用于描述所分析肿瘤生物学状态的基因表达特征。主成分分析(PCA)可用于将基因特征总结为单个分数。我们的假设是,利用PCA的固有特性,当基因特征应用于新数据集时可以得到验证。这种验证基于四个关键概念。一致性:基因特征的元素之间的相关性应非偶然。独特性:所检查数据的总体方向应能驱动大部分观察到的信号。稳健性:如果一个基因特征旨在测量单一生物学效应,那么与该特征内的其他信号相比,这个信号应足够强且独特。可转移性:所推导的PCA基因特征分数在目标数据集中应与在训练数据集中描述相同的生物学特性。所提出的验证程序可确保基于PCA的基因特征在应用于除其训练所用数据集之外的其他数据集时能按预期表现。描述多个独立生物学成分的复杂特征也很容易识别。