基于相似度的稀疏子集选择。

Dissimilarity-Based Sparse Subset Selection.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2016 Nov;38(11):2182-2197. doi: 10.1109/TPAMI.2015.2511748. Epub 2015 Dec 23.

DOI:10.1109/TPAMI.2015.2511748

Abstract

Finding an informative subset of a large collection of data points or models is at the center of many problems in computer vision, recommender systems, bio/health informatics as well as image and natural language processing. Given pairwise dissimilarities between the elements of a 'source set' and a 'target set,' we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, NP-hard, we consider a convex relaxation. The solution of our optimization finds representatives and the assignment of each element of the target set to each representative, hence, obtaining a clustering. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, our algorithm finds representatives from all groups and reveals clustering of the sets. In addition, we show that the proposed framework can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational time. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.

摘要

在计算机视觉、推荐系统、生物/健康信息学以及图像和自然语言处理等领域，从大量数据点或模型中找到信息量丰富的子集是许多问题的核心。给定“源集”和“目标集”中元素之间的成对差异，我们考虑找到源集的子集（称为代表或示例）的问题，该子集可以有效地描述目标集。我们将问题表述为行稀疏正则化迹最小化问题。由于所提出的公式通常是 NP 难的，因此我们考虑了凸松弛。我们的优化解决方案找到代表以及目标集的每个元素到每个代表的分配，从而获得聚类。我们分析了我们提出的优化的解决方案作为正则化参数的函数。我们表明，当两个集合共同划分为多个组时，我们的算法从所有组中找到代表，并揭示集合的聚类。此外，我们表明所提出的框架可以有效地处理异常值。我们的算法适用于任意的不相似性，这些不相似性可以是不对称的或违反三角不等式。为了有效地实现我们的算法，我们考虑了交替方向乘子法 (ADMM) 框架，这导致问题大小的二次复杂度。我们表明，ADMM 实现允许算法并行化，从而进一步减少计算时间。最后，通过对真实数据集的实验，我们表明我们提出的算法在使用代表图像的场景分类和使用代表模型的时间序列建模和分割这两个问题上提高了现有技术水平。

相似文献

IEEE Trans Pattern Anal Mach Intell. 2016 Nov;38(11):2182-2197. doi: 10.1109/TPAMI.2015.2511748. Epub 2015 Dec 23.

Fast and accurate matrix completion via truncated nuclear norm regularization.

IEEE Trans Pattern Anal Mach Intell. 2013 Sep;35(9):2117-30. doi: 10.1109/TPAMI.2012.271.

Sparse subspace clustering: algorithm, theory, and applications.

IEEE Trans Pattern Anal Mach Intell. 2013 Nov;35(11):2765-81. doi: 10.1109/TPAMI.2013.57.

An alternating direction algorithm for total variation reconstruction of distributed parameters.

IEEE Trans Image Process. 2012 Jun;21(6):3004-16. doi: 10.1109/TIP.2012.2188033. Epub 2012 Feb 14.

Trace Norm Regularized CANDECOMP/PARAFAC Decomposition With Missing Data.

IEEE Trans Cybern. 2015 Nov;45(11):2437-48. doi: 10.1109/TCYB.2014.2374695.

A Fast Algorithm for Learning Overcomplete Dictionary for Sparse Representation Based on Proximal Operators.

Neural Comput. 2015 Sep;27(9):1951-82. doi: 10.1162/NECO_a_00763. Epub 2015 Jul 10.

Solving large-scale general phase retrieval problems via a sequence of convex relaxations.

J Opt Soc Am A Opt Image Sci Vis. 2018 Aug 1;35(8):1410-1419. doi: 10.1364/JOSAA.35.001410.

Undersampled Phase Retrieval with Outliers.

IEEE Trans Comput Imaging. 2015 Dec 1;1(4):247-258. doi: 10.1109/TCI.2015.2498402.

Face recognition using sparse approximated nearest points between image sets.

IEEE Trans Pattern Anal Mach Intell. 2012 Oct;34(10):1992-2004. doi: 10.1109/TPAMI.2011.283.

Robust fluence map optimization via alternating direction method of multipliers with empirical parameter optimization.

Phys Med Biol. 2016 Apr 7;61(7):2838-50. doi: 10.1088/0031-9155/61/7/2838. Epub 2016 Mar 17.

引用本文的文献

Metabolites. 2023 Jan 9;13(1):105. doi: 10.3390/metabo13010105.

Sampling via the aggregation value for data-driven manufacturing.

Natl Sci Rev. 2022 Sep 24;9(11):nwac201. doi: 10.1093/nsr/nwac201. eCollection 2022 Nov.

A real-time rural domestic garbage detection algorithm with an improved YOLOv5s network model.

Sci Rep. 2022 Oct 7;12(1):16802. doi: 10.1038/s41598-022-20983-1.

Transfer Shape Modeling Towards High-throughput Microscopy Image Segmentation.

Med Image Comput Comput Assist Interv. 2016 Oct;9902:183-190. doi: 10.1007/978-3-319-46726-9_22. Epub 2016 Oct 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于相似度的稀疏子集选择。

Dissimilarity-Based Sparse Subset Selection.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献