Suppr超能文献

基于奇异向量的降维。

Dimensionality reduction using singular vectors.

机构信息

Department of Computer Science, Memorial University of Newfoundland, St. John's, NL, Canada.

Department of Mathematics ans Statistics, Memorial University of Newfoundland, St. John's, NL, Canada.

出版信息

Sci Rep. 2021 Feb 15;11(1):3832. doi: 10.1038/s41598-021-83150-y.

Abstract

A common problem in machine learning and pattern recognition is the process of identifying the most relevant features, specifically in dealing with high-dimensional datasets in bioinformatics. In this paper, we propose a new feature selection method, called Singular-Vectors Feature Selection (SVFS). Let [Formula: see text] be a labeled dataset, where [Formula: see text] is the class label and features (attributes) are columns of matrix A. We show that the signature matrix [Formula: see text] can be used to partition the columns of A into clusters so that columns in a cluster correlate only with the columns in the same cluster. In the first step, SVFS uses the signature matrix [Formula: see text] of D to find the cluster that contains [Formula: see text]. We reduce the size of A by discarding features in the other clusters as irrelevant features. In the next step, SVFS uses the signature matrix [Formula: see text] of reduced A to partition the remaining features into clusters and choose the most important features from each cluster. Even though SVFS works perfectly on synthetic datasets, comprehensive experiments on real world benchmark and genomic datasets shows that SVFS exhibits overall superior performance compared to the state-of-the-art feature selection methods in terms of accuracy, running time, and memory usage. A Python implementation of SVFS along with the datasets used in this paper are available at https://github.com/Majid1292/SVFS .

摘要

机器学习和模式识别中的一个常见问题是识别最相关特征的过程,特别是在处理生物信息学中的高维数据集时。在本文中,我们提出了一种新的特征选择方法,称为奇异向量特征选择 (SVFS)。设 [Formula: see text] 为带标签的数据集,其中 [Formula: see text] 是类标签,特征(属性)是矩阵 A 的列。我们表明,特征矩阵 [Formula: see text] 可用于将 A 的列划分为簇,使得一个簇中的列仅与同一簇中的列相关。在第一步中,SVFS 使用 D 的特征矩阵 [Formula: see text] 找到包含 [Formula: see text] 的簇。我们通过丢弃其他簇中的特征作为不相关特征来减小 A 的大小。在下一步中,SVFS 使用降维后的 A 的特征矩阵 [Formula: see text] 将剩余的特征划分为簇,并从每个簇中选择最重要的特征。尽管 SVFS 在合成数据集上的效果非常好,但对真实世界基准和基因组数据集的综合实验表明,与最先进的特征选择方法相比,SVFS 在准确性、运行时间和内存使用方面表现出总体优越的性能。SVFS 的 Python 实现以及本文中使用的数据集可在 https://github.com/Majid1292/SVFS 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89d7/7884742/7fceb7ccdeeb/41598_2021_83150_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验