Department of Orthopaedic Surgery, National Taiwan University Hospital, Taipei, Taiwan.
Department of Medical Education, National Taiwan University Hospital, Taipei, Taiwan.
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
Bone metastasis in advanced cancer is challenging because of pain, functional issues, and reduced life expectancy. Treatment planning is complex, with consideration of factors such as location, symptoms, and prognosis. Prognostic models help guide treatment choices, with Skeletal Oncology Research Group machine-learning algorithms (SORG-MLAs) showing promise in predicting survival for initial spinal metastases and extremity metastases treated with surgery or radiotherapy. Improved therapies extend patient lifespans, increasing the risk of subsequent skeletal-related events (SREs). Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. For these patients, a thorough evaluation, including accurate survival prediction, is essential to determine the most appropriate treatment and avoid aggressive surgical treatment for patients with a poor survival likelihood. Patients experiencing subsequent SREs often suffer from disease progression, indicating a deteriorating condition. However, some variables in the SORG prediction model, such as tumor histology, visceral metastasis, and previous systemic therapies, might remain consistent between initial and subsequent SREs. Given the prognostic difference between patients with and without a subsequent SRE, the efficacy of established prognostic models-originally designed for individuals with an initial SRE-in addressing a subsequent SRE remains uncertain. Therefore, it is crucial to verify the model's utility for subsequent SREs.
QUESTION/PURPOSE: We aimed to evaluate the reliability of the SORG-MLAs for survival prediction in patients undergoing surgery or radiotherapy for a subsequent SRE for whom both the initial and subsequent SREs occurred in the spine or extremities.
We retrospectively included 738 patients who were 20 years or older who received surgery or radiotherapy for initial and subsequent SREs at a tertiary referral center and local hospital in Taiwan between 2010 and 2019. We excluded 74 patients whose initial SRE was in the spine and in whom the subsequent SRE occurred in the extremities and 37 patients whose initial SRE was in the extremities and the subsequent SRE was in the spine. The rationale was that different SORG-MLAs were exclusively designed for patients who had an initial spine metastasis and those who had an initial extremity metastasis, irrespective of whether they experienced metastatic events in other areas (for example, a patient experiencing an extremity SRE before his or her spinal SRE would also be regarded as a candidate for an initial spinal SRE). Because these patients were already validated in previous studies, we excluded them in case we overestimated our result. Five patients with malignant primary bone tumors and 38 patients in whom the metastasis's origin could not be identified were excluded, leaving 584 patients for analysis. The 584 included patients were categorized into two subgroups based on the location of initial and subsequent SREs: the spine group (68% [399]) and extremity group (32% [185]). No patients were lost to follow-up. Patient data at the time they presented with a subsequent SRE were collected, and survival predictions at this timepoint were calculated using the SORG-MLAs. Multiple imputation with the Missforest technique was conducted five times to impute the missing proportions of each predictor. The effectiveness of SORG-MLAs was gauged through several statistical measures, including discrimination (measured by the area under the receiver operating characteristic curve [AUC]), calibration, overall performance (Brier score), and decision curve analysis. Discrimination refers to the model's ability to differentiate between those with the event and those without the event. An AUC ranges from 0.5 to 1.0, with 0.5 indicating the worst discrimination and 1.0 indicating perfect discrimination. An AUC of 0.7 is considered clinically acceptable discrimination. Calibration is the comparison between the frequency of observed events and the predicted probabilities. In an ideal calibration, the observed and predicted survival rates should be congruent. The logarithm of observed-to-expected survival ratio [log(O:E)] offers insight into the model's overall calibration by considering the total number of observed (O) and expected (E) events. The Brier score measures the mean squared difference between the predicted probability of possible outcomes for each individual and the observed outcomes, ranging from 0 to 1, with 0 indicating perfect overall performance and 1 indicating the worst performance. Moreover, the prevalence of the outcome should be considered, so a null-model Brier score was also calculated by assigning a probability equal to the prevalence of the outcome (in this case, the actual survival rate) to each patient. The benefit of the prediction model is determined by comparing its Brier score with that of the null model. If a prediction model's Brier score is lower than the null model's Brier score, the prediction model is deemed as having good performance. A decision curve analysis was performed for models to evaluate the "net benefit," which weighs the true positive rate over the false positive rate against the "threshold probabilities," the ratio of risk over benefit after an intervention was derived based on a comprehensive clinical evaluation and a well-discussed shared-decision process. A good predictive model should yield a higher net benefit than default strategies (treating all patients and treating no patients) across a range of threshold probabilities.
For the spine group, the algorithms displayed acceptable AUC results (median AUCs of 0.69 to 0.72) for 42-day, 90-day, and 1-year survival predictions after treatment for a subsequent SRE. In contrast, the extremity group showed median AUCs ranging from 0.65 to 0.73 for the corresponding survival periods. All Brier scores were lower than those of their null model, indicating the SORG-MLAs' good overall performances for both cohorts. The SORG-MLAs yielded a net benefit for both cohorts; however, they overestimated 1-year survival probabilities in patients with a subsequent SRE in the spine, with a median log(O:E) of -0.60 (95% confidence interval -0.77 to -0.42).
The SORG-MLAs maintain satisfactory discriminatory capacity and offer considerable net benefits through decision curve analysis, indicating their continued viability as prediction tools in this clinical context. However, the algorithms overestimate 1-year survival rates for patients with a subsequent SRE of the spine, warranting consideration of specific patient groups. Clinicians and surgeons should exercise caution when using the SORG-MLAs for survival prediction in these patients and remain aware of potential mispredictions when tailoring treatment plans, with a preference for less invasive treatments. Ultimately, this study emphasizes the importance of enhancing prognostic algorithms and developing innovative tools for patients with subsequent SREs as the life expectancy in patients with bone metastases continues to improve and healthcare providers will encounter these patients more often in daily practice.
Level III, prognostic study.
晚期癌症的骨转移具有挑战性,因为其会引起疼痛、功能问题和预期寿命缩短。治疗方案的规划较为复杂,需要考虑位置、症状和预后等因素。预后模型有助于指导治疗选择,骨骼肿瘤研究组机器学习算法(SORG-MLAs)在预测初始脊柱转移和接受手术或放疗的四肢转移的生存方面显示出了良好的效果。改良疗法延长了患者的生存期,增加了随后发生骨骼相关事件(SREs)的风险。发生后续 SRE 的患者经常会出现疾病进展,表明病情恶化。对于这些患者,需要进行全面评估,包括准确的生存预测,以确定最合适的治疗方法,并避免对生存可能性较差的患者进行激进的手术治疗。发生后续 SRE 的患者经常会出现疾病进展,表明病情恶化。然而,在初始 SRE 和后续 SRE 中,SORG 预测模型中的一些变量,如肿瘤组织学、内脏转移和先前的系统治疗,可能保持一致。鉴于有和无后续 SRE 的患者之间的预后差异,最初为发生初始 SRE 的患者设计的既定预后模型在解决后续 SRE 方面的疗效仍不确定。因此,验证该模型对后续 SRE 的适用性至关重要。
问题/目的:我们旨在评估 SORG-MLAs 对接受手术或放疗治疗后续 SRE 的患者的生存预测的可靠性,这些患者的初始和后续 SRE 均发生在脊柱或四肢。
我们回顾性纳入了 2010 年至 2019 年间在台湾的一家三级转诊中心和当地医院接受初始和后续 SRE 手术或放疗的 20 岁及以上的 738 名患者。我们排除了 74 名初始 SRE 发生在脊柱且后续 SRE 发生在四肢的患者,以及 37 名初始 SRE 发生在四肢且后续 SRE 发生在脊柱的患者。这样做的原因是,不同的 SORG-MLAs 专门用于初始脊柱转移的患者和初始四肢转移的患者,无论他们是否在其他部位发生转移事件(例如,先发生四肢 SRE 后发生脊柱 SRE 的患者也被视为初始脊柱 SRE 的候选者)。由于这些患者在之前的研究中已经得到验证,因此我们将其排除在外,以免高估我们的结果。我们还排除了 5 名患有恶性原发性骨肿瘤的患者和 38 名无法确定转移来源的患者,最终有 584 名患者进行分析。这 584 名患者根据初始和后续 SRE 的位置分为两组:脊柱组(68%[399])和四肢组(32%[185])。没有患者失访。收集患者发生后续 SRE 时的患者数据,并使用 SORG-MLAs 计算此时的生存预测。采用 Missforest 技术进行五次多重插补,以插补每个预测因子的缺失比例。通过几项统计指标来衡量 SORG-MLAs 的有效性,包括区分度(通过接受者操作特征曲线下的面积[AUROC]衡量)、校准、整体性能(Brier 评分)和决策曲线分析。区分度是指模型区分有事件和无事件患者的能力。AUROC 的范围为 0.5 到 1.0,其中 0.5 表示最差的区分度,1.0 表示完美的区分度。0.7 的 AUROC 被认为是具有临床可接受的区分度。校准是指观察到的事件频率与预测概率之间的比较。在理想的校准中,观察到的和预测的生存率应该一致。通过考虑观察到的(O)和预期的(E)事件总数,对数观察到的与预期的生存比[log(O:E)]可以深入了解模型的整体校准情况。Brier 评分衡量了每个个体可能结果的预测概率与观察到的结果之间的平均平方差,范围从 0 到 1,其中 0 表示整体性能最佳,1 表示性能最差。此外,还应考虑结局的发生率,因此通过将概率分配给每个患者等于结局的发生率(在这种情况下,实际生存率),还计算了无效模型的 Brier 评分。预测模型的获益是通过比较其 Brier 评分与无效模型的 Brier 评分来确定的。如果预测模型的 Brier 评分低于无效模型的 Brier 评分,则认为预测模型的性能较好。还进行了决策曲线分析,以评估模型的“净效益”,该分析通过综合临床评估和经过充分讨论的共同决策过程,根据风险与获益的“阈值概率”对获益进行权衡。在一系列阈值概率下,优于默认策略(治疗所有患者和不治疗所有患者)的预测模型应该具有更高的净效益。
对于脊柱组,算法在治疗后续 SRE 后 42 天、90 天和 1 年的生存预测中表现出可接受的 AUROC 结果(中位数 AUROC 范围为 0.69 至 0.72)。相比之下,四肢组在相应的生存期间的中位数 AUROC 范围为 0.65 至 0.73。所有 Brier 评分均低于其无效模型,表明 SORG-MLAs 对两个队列的整体性能均良好。SORG-MLAs 为两个队列均产生了净效益;然而,在脊柱部位发生后续 SRE 的患者中,1 年生存率的预测值过高,中位数 log(O:E)为-0.60(95%置信区间-0.77 至-0.42)。
SORG-MLAs 保持了令人满意的区分能力,并通过决策曲线分析提供了可观的净效益,表明它们在该临床环境中仍然是可行的预测工具。然而,该算法高估了脊柱部位发生后续 SRE 的患者的 1 年生存率,中位数 log(O:E)为-0.60(95%置信区间-0.77 至-0.42)。临床医生和外科医生在使用 SORG-MLAs 预测这些患者的生存时应谨慎,并在制定治疗计划时应意识到潜在的错误预测,优先考虑侵入性较小的治疗方法。最终,本研究强调了增强预后算法和开发针对后续 SRE 患者的创新工具的重要性,因为接受骨转移治疗的患者的预期寿命继续延长,医疗保健提供者在日常实践中会遇到更多这些患者。
III 级,预后研究。