Suppr超能文献

SORG 机器学习算法对肢体转移瘤的泛化能力如何?2016 年至 2020 年的时间验证。

Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020.

机构信息

Massachusetts General Hospital, Boston, MA, USA.

University Medical Center Groningen, Groningen, the Netherlands.

出版信息

Clin Orthop Relat Res. 2023 Dec 1;481(12):2419-2430. doi: 10.1097/CORR.0000000000002698. Epub 2023 May 25.

Abstract

BACKGROUND

The ability to predict survival accurately in patients with osseous metastatic disease of the extremities is vital for patient counseling and guiding surgical intervention. We, the Skeletal Oncology Research Group (SORG), previously developed a machine-learning algorithm (MLA) based on data from 1999 to 2016 to predict 90-day and 1-year survival of surgically treated patients with extremity bone metastasis. As treatment regimens for oncology patients continue to evolve, this SORG MLA-driven probability calculator requires temporal reassessment of its accuracy.

QUESTION/PURPOSE: Does the SORG-MLA accurately predict 90-day and 1-year survival in patients who receive surgical treatment for a metastatic long-bone lesion in a more recent cohort of patients treated between 2016 and 2020?

METHODS

Between 2017 and 2021, we identified 674 patients 18 years and older through the ICD codes for secondary malignant neoplasm of bone and bone marrow and CPT codes for completed pathologic fractures or prophylactic treatment of an impending fracture. We excluded 40% (268 of 674) of patients, including 18% (118) who did not receive surgery; 11% (72) who had metastases in places other than the long bones of the extremities; 3% (23) who received treatment other than intramedullary nailing, endoprosthetic reconstruction, or dynamic hip screw; 3% (23) who underwent revision surgery, 3% (17) in whom there was no tumor, and 2% (15) who were lost to follow-up within 1 year. Temporal validation was performed using data on 406 patients treated surgically for bony metastatic disease of the extremities from 2016 to 2020 at the same two institutions where the MLA was developed. Variables used to predict survival in the SORG algorithm included perioperative laboratory values, tumor characteristics, and general demographics. To assess the models' discrimination, we computed the c-statistic, commonly referred to as the area under the receiver operating characteristic (AUC) curve for binary classification. This value ranged from 0.5 (representing chance-level performance) to 1.0 (indicating excellent discrimination) Generally, an AUC of 0.75 is considered high enough for use in clinical practice. To evaluate the agreement between predicted and observed outcomes, a calibration plot was used, and the calibration slope and intercept were calculated. Perfect calibration would result in a slope of 1 and intercept of 0. For overall performance, the Brier score and null-model Brier score were determined. The Brier score can range from 0 (representing perfect prediction) to 1 (indicating the poorest prediction). Proper interpretation of the Brier score necessitates a comparison with the null-model Brier score, which represents the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for each patient. Finally, a decision curve analysis was conducted to compare the potential net benefit of the algorithm with other decision-support methods, such as treating all or none of the patients. Overall, 90-day and 1-year mortality were lower in the temporal validation cohort than in the development cohort (90 day: 23% versus 28%; p < 0.001, and 1 year: 51% versus 59%; p<0.001).

RESULTS

Overall survival of the patients in the validation cohort improved from 28% mortality at the 90-day timepoint in the cohort on which the model was trained to 23%, and 59% mortality at the 1-year timepoint to 51%. The AUC was 0.78 (95% CI 0.72 to 0.82) for 90-day survival and 0.75 (95% CI 0.70 to 0.79) for 1-year survival, indicating the model could distinguish the two outcomes reasonably. For the 90-day model, the calibration slope was 0.71 (95% CI 0.53 to 0.89), and the intercept was -0.66 (95% CI -0.94 to -0.39), suggesting the predicted risks were overly extreme, and that in general, the risk of the observed outcome was overestimated. For the 1-year model, the calibration slope was 0.73 (95% CI 0.56 to 0.91) and the intercept was -0.67 (95% CI -0.90 to -0.43). With respect to overall performance, the model's Brier scores for the 90-day and 1-year models were 0.16 and 0.22. These scores were higher than the Brier scores of internal validation of the development study (0.13 and 0.14) models, indicating the models' performance has declined over time.

CONCLUSION

The SORG MLA to predict survival after surgical treatment of extremity metastatic disease showed decreased performance on temporal validation. Moreover, in patients undergoing innovative immunotherapy, the possibility of mortality risk was overestimated in varying severity. Clinicians should be aware of this overestimation and discount the prediction of the SORG MLA according to their own experience with this patient population. Generally, these results show that temporal reassessment of these MLA-driven probability calculators is of paramount importance because the predictive performance may decline over time as treatment regimens evolve. The SORG-MLA is available as a freely accessible internet application at https://sorg-apps.shinyapps.io/extremitymetssurvival/ .Level of Evidence Level III, prognostic study.

摘要

背景

准确预测四肢骨转移患者的生存情况对患者咨询和指导手术干预至关重要。我们骨骼肿瘤研究组(SORG)之前开发了一种基于 1999 年至 2016 年数据的机器学习算法(MLA),以预测接受手术治疗的患者在 90 天和 1 年的生存情况。由于肿瘤患者的治疗方案不断发展,因此需要对这个 SORG MLA 驱动的概率计算器进行时间重新评估,以确保其准确性。

问题/目的:在接受手术治疗长骨转移病变的最近一组患者(2016 年至 2020 年)中,SORG-MLA 是否能准确预测 90 天和 1 年的生存率?

方法

在 2017 年至 2021 年期间,我们通过继发性恶性骨肿瘤和骨髓的 ICD 代码和完成病理性骨折或预防性治疗即将发生的骨折的 CPT 代码,在 674 名 18 岁及以上的患者中识别出 40%(268 名)患者被排除在外,包括 18%(118 名)未接受手术的患者;11%(72 名)有骨转移的地方不是四肢长骨的患者;3%(23 名)接受除髓内钉、假体重建或动力髋螺钉以外的治疗的患者;3%(23 名)接受翻修手术的患者;3%(17 名)肿瘤不存在的患者;2%(15 名)在 1 年内失访的患者。我们使用在同一两个机构接受手术治疗四肢转移性骨肿瘤的 2016 年至 2020 年期间的 406 名患者的数据进行时间验证。用于预测生存率的 SORG 算法变量包括围手术期实验室值、肿瘤特征和一般人口统计学特征。为了评估模型的判别能力,我们计算了 c 统计量,通常称为二分类的接受者操作特征(ROC)曲线下面积。该值范围从 0.5(表示机会水平表现)到 1.0(表示出色的判别能力)。通常,AUC 为 0.75 被认为足够高,可用于临床实践。为了评估预测结果与实际结果之间的一致性,使用校准图来计算校准斜率和截距。完美的校准结果会导致斜率为 1,截距为 0。对于整体性能,我们确定了 Brier 得分和零模型 Brier 得分。Brier 得分范围从 0(表示完美预测)到 1(表示最差预测)。要正确解释 Brier 得分,需要与零模型 Brier 得分进行比较,后者表示算法对每个患者的概率预测等于该患者的结果的总体患病率。最后,进行决策曲线分析,以比较算法与其他决策支持方法(例如治疗所有或无患者)的潜在净收益。总体而言,在时间验证队列中,90 天和 1 年的死亡率低于开发队列(90 天:23%对 28%;p<0.001,1 年:51%对 59%;p<0.001)。

结果

验证队列中的患者总体生存率从模型训练队列中 90 天时间点的 28%死亡率提高到 23%,1 年时间点的 59%死亡率提高到 51%。90 天的 AUC 为 0.78(95%CI 0.72 至 0.82),1 年的 AUC 为 0.75(95%CI 0.70 至 0.79),表明该模型可以合理地区分这两种结果。对于 90 天模型,校准斜率为 0.71(95%CI 0.53 至 0.89),截距为-0.66(95%CI-0.94 至-0.39),这表明预测风险过于极端,并且一般来说,观察到的结果的风险被高估了。对于 1 年模型,校准斜率为 0.73(95%CI 0.56 至 0.91),截距为-0.67(95%CI-0.90 至-0.43)。就整体性能而言,90 天和 1 年模型的模型 Brier 得分分别为 0.16 和 0.22。这些分数高于开发研究(0.13 和 0.14)模型的内部验证的 Brier 得分,表明模型的性能随时间推移而下降。

结论

SORG MLA 预测四肢转移性疾病手术后的生存情况,在时间验证中表现出性能下降。此外,在接受创新性免疫治疗的患者中,死亡率风险的严重程度不同,被高估的程度也不同。临床医生应该意识到这种高估,并根据自己对该患者群体的经验来折扣 SORG MLA 的预测。一般来说,这些结果表明,随着治疗方案的发展,对这些 MLA 驱动的概率计算器进行时间重新评估至关重要,因为预测性能可能会随时间推移而下降。SORG-MLA 可作为免费的网络应用程序在 https://sorg-apps.shinyapps.io/extremitymetssurvival/ 上获得。证据等级 III,预后研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be9/10642892/19959db75df3/abjs-481-2419-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验