Plant Claudia, Böhm Christian, Tilg Bernhard, Baumgartner Christian
Research Group for Clinical Bioinformatics, Institute for Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria.
Bioinformatics. 2006 Apr 15;22(8):981-8. doi: 10.1093/bioinformatics/btl027. Epub 2006 Jan 27.
Classification is an important data mining task in biomedicine. In particular, classification on biomedical data often claims the separation of pathological and healthy samples with highest discriminatory performance for diagnostic issues. Even more important than the overall accuracy is the balance of a classifier, particularly if datasets of unbalanced class size are examined.
We present a novel instance-based classification technique which takes both information of different local density of data objects and local cluster structures into account. Our method, which adopts the basic ideas of density-based outlier detection, determines the local point density in the neighborhood of an object to be classified and of all clusters in the corresponding region. A data object is assigned to that class where it fits best into the local cluster structure. The experimental evaluation on biomedical data demonstrates that our approach outperforms most popular classification methods.
The algorithm LCF is available for testing under http://biomed.umit.at/upload/lcfx.zip.
分类是生物医学中一项重要的数据挖掘任务。特别是,生物医学数据的分类通常要求以最高的区分性能分离病理样本和健康样本以解决诊断问题。比整体准确率更重要的是分类器的平衡性,尤其是在检查类大小不平衡的数据集时。
我们提出了一种新颖的基于实例的分类技术,该技术同时考虑了数据对象不同局部密度的信息和局部聚类结构。我们的方法采用基于密度的异常值检测的基本思想,确定要分类的对象及其相应区域内所有聚类的邻域中的局部点密度。将数据对象分配到其最适合局部聚类结构的类中。对生物医学数据的实验评估表明,我们的方法优于最流行的分类方法。