CONICET and the Center for Research and Development of Information Systems-CIDISI, UTN-FRSF, Lavaise 610, Santa Fe 3000, Argentina.
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):706-16. doi: 10.1109/TCBB.2012.10.
In the biological domain, clustering is based on the assumption that genes or metabolites involved in a common biological process are coexpressed/coaccumulated under the control of the same regulatory network. Thus, a detailed inspection of the grouped patterns to verify their memberships to well-known metabolic pathways could be very useful for the evaluation of clusters from a biological perspective. The aim of this work is to propose a novel approach for the comparison of clustering methods over metabolic data sets, including prior biological knowledge about the relation among elements that constitute the clusters. A way of measuring the biological significance of clustering solutions is proposed. This is addressed from the perspective of the usefulness of the clusters to identify those patterns that change in coordination and belong to common pathways of metabolic regulation. The measure summarizes in a compact way the objective analysis of clustering methods, which respects coherence and clusters distribution. It also evaluates the biological internal connections of such clusters considering common pathways. The proposed measure was tested in two biological databases using three clustering methods.
在生物领域,聚类是基于这样的假设,即参与共同生物过程的基因或代谢物在相同调控网络的控制下共同表达/共同积累。因此,详细检查分组模式以验证它们是否属于已知的代谢途径,从生物学角度评估聚类非常有用。本工作的目的是提出一种新的方法,用于比较代谢数据集上的聚类方法,包括构成聚类的元素之间关系的先验生物学知识。提出了一种衡量聚类解决方案生物学意义的方法。这是从聚类的有用性的角度出发的,旨在识别那些协调变化并属于代谢调控共同途径的模式。该度量方法以简洁的方式总结了对聚类方法的客观分析,该方法既考虑了聚类的一致性,也考虑了聚类的分布。它还考虑了共同途径,评估了这些聚类的生物学内部连接。所提出的度量方法在两个生物数据库中使用三种聚类方法进行了测试。