Mylona Eugenia, Zaridis Dimitrios I, Kalantzopoulos Charalampos Ν, Tachos Nikolaos S, Regge Daniele, Papanikolaou Nikolaos, Tsiknakis Manolis, Marias Kostas, Fotiadis Dimitrios I
Biomedical Research Institute, FORTH, GR 45110, Ioannina, Greece.
Unit of Medical Technology Intelligent Information Systems, University of Ioannina, Ioannina, Greece.
Insights Imaging. 2024 Nov 4;15(1):265. doi: 10.1186/s13244-024-01783-9.
Radiomics-based analyses encompass multiple steps, leading to ambiguity regarding the optimal approaches for enhancing model performance. This study compares the effect of several feature selection methods, machine learning (ML) classifiers, and sources of radiomic features, on models' performance for the diagnosis of clinically significant prostate cancer (csPCa) from bi-parametric MRI.
Two multi-centric datasets, with 465 and 204 patients each, were used to extract 1246 radiomic features per patient and MRI sequence. Ten feature selection methods, such as Boruta, mRMRe, ReliefF, recursive feature elimination (RFE), random forest (RF) variable importance, L1-lasso, etc., four ML classifiers, namely SVM, RF, LASSO, and boosted generalized linear model (GLM), and three sets of radiomics features, derived from T2w images, ADC maps, and their combination, were used to develop predictive models of csPCa. Their performance was evaluated in a nested cross-validation and externally, using seven performance metrics.
In total, 480 models were developed. In nested cross-validation, the best model combined Boruta with Boosted GLM (AUC = 0.71, F1 = 0.76). In external validation, the best model combined L1-lasso with boosted GLM (AUC = 0.71, F1 = 0.47). Overall, Boruta, RFE, L1-lasso, and RF variable importance were the top-performing feature selection methods, while the choice of ML classifier didn't significantly affect the results. The ADC-derived features showed the highest discriminatory power with T2w-derived features being less informative, while their combination did not lead to improved performance.
The choice of feature selection method and the source of radiomic features have a profound effect on the models' performance for csPCa diagnosis.
This work may guide future radiomic research, paving the way for the development of more effective and reliable radiomic models; not only for advancing prostate cancer diagnostic strategies, but also for informing broader applications of radiomics in different medical contexts.
Radiomics is a growing field that can still be optimized. Feature selection method impacts radiomics models' performance more than ML algorithms. Best feature selection methods: RFE, LASSO, RF, and Boruta. ADC-derived radiomic features yield more robust models compared to T2w-derived radiomic features.
基于放射组学的分析包含多个步骤,这导致在提高模型性能的最佳方法上存在模糊性。本研究比较了几种特征选择方法、机器学习(ML)分类器以及放射组学特征来源对双参数磁共振成像(MRI)诊断临床显著性前列腺癌(csPCa)模型性能的影响。
使用两个多中心数据集,每个数据集分别有465例和204例患者,针对每位患者和每个MRI序列提取1246个放射组学特征。采用十种特征选择方法,如Boruta、mRMRe、ReliefF、递归特征消除(RFE)、随机森林(RF)变量重要性、L1 - 套索等,四种ML分类器,即支持向量机(SVM)、RF、套索回归(LASSO)和增强广义线性模型(GLM),以及从T2加权(T2w)图像、表观扩散系数(ADC)图及其组合中得出的三组放射组学特征,来构建csPCa的预测模型。使用七个性能指标在嵌套交叉验证和外部对其性能进行评估。
总共开发了480个模型。在嵌套交叉验证中,最佳模型将Boruta与增强GLM相结合(曲线下面积[AUC] = 0.71,F1值 = 0.76)。在外部验证中,最佳模型将L1 - 套索与增强GLM相结合(AUC = 0.71,F1值 = 0.47)。总体而言,Boruta、RFE、L1 - 套索和RF变量重要性是表现最佳的特征选择方法,而ML分类器的选择对结果没有显著影响。ADC衍生特征显示出最高的区分能力,T2w衍生特征信息量较少,而它们的组合并未带来性能提升。
特征选择方法和放射组学特征来源的选择对csPCa诊断模型的性能有深远影响。
这项工作可能会指导未来的放射组学研究,为开发更有效、更可靠的放射组学模型铺平道路;不仅用于推进前列腺癌诊断策略,还用于为放射组学在不同医学背景下的更广泛应用提供信息。
放射组学是一个仍可优化的不断发展的领域。特征选择方法对放射组学模型性能的影响大于ML算法。最佳特征选择方法:RFE、LASSO、RF和Boruta。与T2w衍生的放射组学特征相比,ADC衍生的放射组学特征能产生更稳健的模型。