Centre de Recherche du CHU de Québec, Université Laval, Canada.
Department of Radiology and Imaging Sciences, Emory University, Atlanta, Georgia, USA.
Cancer Med. 2024 Oct;13(20):e70359. doi: 10.1002/cam4.70359.
Lung cancer remains the leading cause of cancer-related mortality worldwide, with most cases diagnosed at advanced stages. Hence, there is a need to develop effective predictive models for early detection. This study aims to investigate the impact of imaging parameters and delta radiomic features from temporal scans on lung cancer risk prediction.
Using the National Lung Screening Trial (NLST) within a nested case-control study involving 462 positive screenings, radiomic features were extracted from temporal computed tomography (CT) scans and harmonized with ComBat method to adjust variations in slice thickness category (TC) and reconstruction kernel type (KT). Both harmonized and non-harmonized features from baseline (T0), delta features between T0 and a year later (T1), and combined T0 and delta features were utilized for the analysis. Feature reduction was done using LASSO, followed by five feature selection (FS) methods and nine machine learning (ML) models, evaluated with 5-fold cross-validation repeated 10 times. Synthetic Minority Oversampling Technique (SMOTE) was applied to address class imbalances for lung cancer risk prediction.
Models using delta features outperformed baseline features, with SMOTE consistently boosting performance when using combination of baseline and delta features. TC-based harmonized features improved performance with SMOTE, but overall, harmonization did not significantly enhance the model performance. The highest test score of 0.76 was achieved in three scenarios: delta features with a Gradient Boosting (GB) model (TC-based harmonization and MultiSurf FS); and T0 + delta features, with both a Support Vector Classifier (SVC) model (KT-based harmonization and F-test FS), and an XGBoost (XGB) model (TC-based harmonization and Mutual Information (MI) FS), all using SMOTE.
This study underscores the significance of delta radiomic features and balanced datasets to improve lung cancer prediction. While our findings are based on a subsample of NLST data, they provide a valuable foundation for further exploration. Further research is needed to assess the impact of harmonization on imaging-derived models. Future investigations should explore advanced harmonization techniques and additional imaging parameters to develop robust radiomics-based biomarkers of lung cancer risk.
肺癌仍然是全球癌症相关死亡的主要原因,大多数病例在晚期诊断。因此,需要开发有效的预测模型来进行早期检测。本研究旨在探讨时间扫描的成像参数和增量放射组学特征对肺癌风险预测的影响。
利用国家肺癌筛查试验(NLST)内嵌套病例对照研究,纳入 462 例阳性筛查者,从时间计算机断层扫描(CT)扫描中提取放射组学特征,并使用 ComBat 方法进行调和,以调整切片厚度类别(TC)和重建核类型(KT)的变化。基线(T0)时使用调和和非调和特征、T0 与一年后(T1)的增量特征以及 T0 和增量特征的组合进行分析。使用 LASSO 进行特征降维,然后使用 5 种特征选择(FS)方法和 9 种机器学习(ML)模型进行评估,重复进行 10 次 5 折交叉验证。为了进行肺癌风险预测,应用了合成少数类过采样技术(SMOTE)来解决类别不平衡问题。
使用增量特征的模型优于基线特征,SMOTE 始终在使用基线和增量特征组合时提高性能。基于 TC 的调和特征在使用 SMOTE 时提高了性能,但总体而言,调和并不能显著提高模型性能。在三种情况下实现了最高测试分数 0.76:使用梯度提升(GB)模型(基于 TC 的调和和 MultiSurf FS)的增量特征;以及使用支持向量分类器(SVC)模型(基于 KT 的调和和 F 检验 FS)和 XGBoost(XGB)模型(基于 TC 的调和和互信息(MI)FS)的 T0+增量特征,所有模型均使用 SMOTE。
本研究强调了增量放射组学特征和平衡数据集对改善肺癌预测的重要性。虽然我们的研究结果基于 NLST 数据的一个子样本,但它们为进一步研究提供了有价值的基础。需要进一步研究以评估调和对成像衍生模型的影响。未来的研究应该探索先进的调和技术和其他成像参数,以开发稳健的基于放射组学的肺癌风险生物标志物。