Suppr超能文献

基于相似度的稀疏子集选择。

Dissimilarity-Based Sparse Subset Selection.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2016 Nov;38(11):2182-2197. doi: 10.1109/TPAMI.2015.2511748. Epub 2015 Dec 23.

Abstract

Finding an informative subset of a large collection of data points or models is at the center of many problems in computer vision, recommender systems, bio/health informatics as well as image and natural language processing. Given pairwise dissimilarities between the elements of a 'source set' and a 'target set,' we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, NP-hard, we consider a convex relaxation. The solution of our optimization finds representatives and the assignment of each element of the target set to each representative, hence, obtaining a clustering. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, our algorithm finds representatives from all groups and reveals clustering of the sets. In addition, we show that the proposed framework can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational time. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

摘要

在计算机视觉、推荐系统、生物/健康信息学以及图像和自然语言处理等领域,从大量数据点或模型中找到信息量丰富的子集是许多问题的核心。给定“源集”和“目标集”中元素之间的成对差异,我们考虑找到源集的子集(称为代表或示例)的问题,该子集可以有效地描述目标集。我们将问题表述为行稀疏正则化迹最小化问题。由于所提出的公式通常是 NP 难的,因此我们考虑了凸松弛。我们的优化解决方案找到代表以及目标集的每个元素到每个代表的分配,从而获得聚类。我们分析了我们提出的优化的解决方案作为正则化参数的函数。我们表明,当两个集合共同划分为多个组时,我们的算法从所有组中找到代表,并揭示集合的聚类。此外,我们表明所提出的框架可以有效地处理异常值。我们的算法适用于任意的不相似性,这些不相似性可以是不对称的或违反三角不等式。为了有效地实现我们的算法,我们考虑了交替方向乘子法 (ADMM) 框架,这导致问题大小的二次复杂度。我们表明,ADMM 实现允许算法并行化,从而进一步减少计算时间。最后,通过对真实数据集的实验,我们表明我们提出的算法在使用代表图像的场景分类和使用代表模型的时间序列建模和分割这两个问题上提高了现有技术水平。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验