Suppr超能文献

用于癌症分期的蒙特卡罗梯度提升树:一种机器学习方法。

Monte Carlo Gradient Boosted Trees for Cancer Staging: A Machine Learning Approach.

作者信息

Eley Audrey, Hlaing Thu Thu, Breininger Daniel, Helforoush Zarindokht, Kachouie Nezamoddin N

机构信息

Department of Mathematics and Systems Engineering, Florida Institute of Technology, Melbourne, FL 32901, USA.

Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, FL 32901, USA.

出版信息

Cancers (Basel). 2025 Jul 24;17(15):2452. doi: 10.3390/cancers17152452.

Abstract

Machine learning algorithms are commonly employed for classification and interpretation of high-dimensional data. The classification task is often broken down into two separate procedures, and different methods are applied to achieve accurate results and produce interpretable outcomes. First, an effective subset of high-dimensional features must be extracted and then the selected subset will be used to train a classifier. Gradient Boosted Trees (GBT) is an ensemble model and, particularly due to their robustness, ability to model complex nonlinear interactions, and feature interpretability, they are well suited for complex applications. XGBoost (eXtreme Gradient Boosting) is a high-performance implementation of GBT that incorporates regularization, parallel computation, and efficient tree pruning that makes it a suitable efficient, interpretable, and scalable classifier with potential applications to medical data analysis. In this study, a Monte Carlo Gradient Boosted Trees (MCGBT) model is proposed for both feature reduction and classification. The proposed MCGBT method was applied to a lung cancer dataset for feature identification and classification. The dataset contains 107 radiomics which are quantitative imaging biomarkers extracted from CT scans. A reduced set of 12 radiomics were identified, and patients were classified into different cancer stages. Cancer staging accuracy of 90.3% across 100 independent runs was achieved which was on par with that obtained using the full set of 107 radiomics, enabling lean and deployable classifiers.

摘要

机器学习算法通常用于高维数据的分类和解释。分类任务通常分为两个独立的过程,并应用不同的方法来获得准确的结果并产生可解释的结果。首先,必须提取高维特征的有效子集,然后将选定的子集用于训练分类器。梯度提升树(GBT)是一种集成模型,特别是由于其鲁棒性、对复杂非线性相互作用进行建模的能力以及特征可解释性,它们非常适合复杂的应用。XGBoost(极端梯度提升)是GBT的一种高性能实现,它结合了正则化、并行计算和高效的树剪枝,使其成为一个合适的高效、可解释和可扩展的分类器,在医学数据分析中有潜在的应用。在本研究中,提出了一种蒙特卡罗梯度提升树(MCGBT)模型用于特征约简和分类。将所提出的MCGBT方法应用于肺癌数据集进行特征识别和分类。该数据集包含107个放射组学特征,这些特征是从CT扫描中提取的定量成像生物标志物。确定了一组精简的12个放射组学特征,并将患者分类到不同的癌症阶段。在100次独立运行中实现了90.3%的癌症分期准确率,这与使用完整的107个放射组学特征所获得的准确率相当,从而实现了精简且可部署的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/391e/12346472/7b6f779a4e85/cancers-17-02452-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验