Suppr超能文献

LACE-UP:一种用于多维二元数据健康亚型分类的集成机器学习方法。

LACE-UP: An ensemble machine-learning method for health subtype classification on multidimensional binary data.

作者信息

Danning Rebecca, Hu Frank B, Lin Xihong

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215.

Department of Nutritional Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02215.

出版信息

Proc Natl Acad Sci U S A. 2025 Apr 29;122(17):e2423341122. doi: 10.1073/pnas.2423341122. Epub 2025 Apr 23.

Abstract

Disease and behavior subtype identification is of significant interest in biomedical research. However, in many settings, subtype discovery is limited by a lack of robust statistical clustering methods appropriate for binary data. Here, we introduce LACE-UP [latent class analysis ensembled with UMAP (uniform manifold approximation and projection) and PCA (principal components analysis)], an ensemble machine-learning method for clustering multidimensional binary data that does not require prespecifying the number of clusters and is robust to realistic data settings, such as the correlation of variables observed from the same individual and the inclusion of variables unrelated to the underlying subtype. The method ensembles latent class analysis, a model-based clustering method; principal components analysis, a spectral signal processing method; and UMAP, a cutting-edge model-free dimensionality reduction algorithm. In simulations, LACE-UP outperforms gold-standard techniques across a variety of realistic scenarios, including in the presence of correlated and extraneous data. We apply LACE-UP to dietary behavior data from the UK Biobank to demonstrate its power to uncover interpretable dietary subtypes that are associated with lipids and cardiovascular risk.

摘要

疾病与行为亚型识别在生物医学研究中具有重大意义。然而,在许多情况下,亚型发现受到缺乏适用于二元数据的强大统计聚类方法的限制。在此,我们引入了LACE-UP[结合UMAP(均匀流形近似与投影)和PCA(主成分分析)的潜在类别分析],这是一种用于对多维二元数据进行聚类的集成机器学习方法,它不需要预先指定聚类数量,并且对现实数据设置具有鲁棒性,例如从同一个体观察到的变量之间的相关性以及包含与潜在亚型无关的变量。该方法将基于模型的聚类方法潜在类别分析、光谱信号处理方法主成分分析以及前沿的无模型降维算法UMAP进行了集成。在模拟中,LACE-UP在各种现实场景中均优于金标准技术,包括存在相关和无关数据的情况。我们将LACE-UP应用于英国生物银行的饮食行为数据,以证明其揭示与脂质和心血管风险相关的可解释饮食亚型的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32b5/12054798/bf3fc8142e26/pnas.2423341122fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验