Department of Mathematics, Statistics and Computer Science, Dordt College, Sioux Center, IA 51250, USA.
BMC Bioinformatics. 2012 Aug 8;13:193. doi: 10.1186/1471-2105-13-193.
Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed.
We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size.
Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
为了得出有意义的生物学结论,对全基因组表达数据进行统计分析需要基因的功能信息。基因本体论(GO)和京都基因与基因组百科全书(KEGG)是功能分组基因集的常用来源。对于细菌,SEED 和 MicrobesOnline 提供了替代的、互补的基因集来源。迄今为止,尚未对这些资源获得的数据进行全面评估。
我们定义了一系列与基因表达数据最常见的统计分析类别直接相关的基因集一致性指标,然后对 17 种不同细菌的 3581 个 Affymetrix®基因表达数组进行了全面分析。我们发现,无论基因集的大小如何,来自 GO 和 KEGG 的基因集的一致性都低于来自 SEED 和 MicrobesOnline 的基因集。
尽管在细菌基因表达数据分析中广泛使用 GO 和 KEGG 基因集,但 SEED 和 MicrobesOnline 为各种统计分析提供了更一致的基因集。在细菌基因表达数据分析中增加对 SEED 和 MicrobesOnline 基因集的使用可能会提高统计能力和表达数据的实用性。