Suppr超能文献

通过机器学习模型进行可解释疾病预测的高维生物标志物识别

High-dimensional biomarker identification for interpretable disease prediction via machine learning models.

作者信息

Dai Yifan, Wu Di, Carroll Ian, Zou Fei, Zou Baiming

机构信息

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.

Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.

出版信息

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf266.

Abstract

MOTIVATION

Omics features, often measured by high-throughput technologies, combined with clinical features, significantly impact the understanding of many complex human diseases. Integrating key omics biomarkers with clinical risk factors is essential for elucidating disease mechanisms, advancing early diagnosis, and enhancing precision medicine. However, the high dimensionality and intricate associations between disease outcomes and omics profiles present substantial analytical challenges.

RESULTS

We propose a high-dimensional feature importance test (HiFIT) framework to address these challenges. Specifically, we develop an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test employing machine learning methods to flexibly model the potential complex associations between disease outcomes and molecular biomarkers. Through extensive numerical simulation studies and practical applications to microbiome-associated weight changes following bariatric surgery, as well as the examination of gene-expression-associated kidney pan-cancer survival data, we demonstrate HiFIT's superior performance in both outcome prediction and feature importance identification.

AVAILABILITY AND IMPLEMENTATION

An R package implementing the HiFIT algorithm is available on GitHub (https://github.com/BZou-lab/HiFIT).

摘要

动机

组学特征通常通过高通量技术进行测量,与临床特征相结合,对理解许多复杂的人类疾病具有重大影响。将关键的组学生物标志物与临床风险因素整合起来,对于阐明疾病机制、推进早期诊断以及加强精准医学至关重要。然而,疾病结局与组学图谱之间的高维度和复杂关联带来了巨大的分析挑战。

结果

我们提出了一个高维特征重要性测试(HiFIT)框架来应对这些挑战。具体而言,我们开发了一种集成数据驱动的生物标志物识别工具——混合特征筛选(HFS),为下游机器学习模型构建候选特征集。来自HFS的预筛选候选特征使用基于计算效率的置换特征重要性测试进一步优化,该测试采用机器学习方法灵活地对疾病结局与分子生物标志物之间潜在的复杂关联进行建模。通过广泛的数值模拟研究以及在减肥手术后微生物群相关体重变化的实际应用,以及对基因表达相关的肾脏泛癌生存数据的检验,我们证明了HiFIT在结局预测和特征重要性识别方面的卓越性能。

可用性与实现

一个实现HiFIT算法的R包可在GitHub上获取(https://github.com/BZou-lab/HiFIT)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c77/12085223/b9a3935fd80d/btaf266f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验