Suppr超能文献

利用代谢途径数据改进聚类。

Improving clustering with metabolic pathway data.

机构信息

Research Center for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, (3000) Santa Fe, Argentina.

出版信息

BMC Bioinformatics. 2014 Apr 10;15:101. doi: 10.1186/1471-2105-15-101.

Abstract

BACKGROUND

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters.

RESULTS

A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view.

CONCLUSIONS

Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.

摘要

背景

在生物信息学中,根据先验的生物学知识,通过手动分析来验证聚类算法返回的每个组是一种常见做法。该过程有助于找到功能相关的模式,从而提出有关其行为和涉及的生物学过程的假设。因此,在仅根据表达模式对数据进行聚类之后,该知识仅作为第二步使用。因此,能够通过将先验知识纳入聚类形成本身来改善生物数据的聚类,从而提高聚类的生物学价值,这将非常有用。

结果

提出了一种新颖的聚类训练算法,该算法在形成聚类的同时评估数据点的生物学内部连接。在该训练算法中,数据点和神经元质心之间的距离计算包括基于知名代谢途径信息的新项。使用来自番茄和拟南芥物种的转录物和代谢物的两个真实数据集来测试标准自组织映射(SOM)训练与受生物学启发的 SOM(bSOM)训练。使用经典的数据挖掘验证措施来评估两种算法获得的聚类解决方案。此外,应用了一种新的度量标准,该度量标准考虑了聚类的生物学连接性。与标准 SOM 训练相比,bSOM 的结果表明,在训练过程中包含生物学信息肯定可以提高所提出方法发现的聚类的生物学价值。值得强调的是,从应用的角度来看,这一事实有效地改善了结果,可以简化其进一步的分析。该算法可在 http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/ 上作为网络演示使用。支持本文结果的源代码和数据集可在 http://sourceforge.net/projects/sourcesinc/files/bsom 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验