基于 Bootstrap 的校正方法在多变量临床预测模型构建中的校正效能再评价。

Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models.

机构信息

Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, Tokyo, Japan.

Office of Biostatistics, Department of Biometrics, Headquarters of Clinical Development, Otsuka Pharmaceutical Co., Ltd., Tokyo, Japan.

出版信息

BMC Med Res Methodol. 2021 Jan 7;21(1):9. doi: 10.1186/s12874-020-01201-w.

DOI:10.1186/s12874-020-01201-w

PMID:33413132

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7789544/

Abstract

BACKGROUND

Multivariable prediction models are important statistical tools for providing synthetic diagnosis and prognostic algorithms based on patients' multiple characteristics. Their apparent measures for predictive accuracy usually have overestimation biases (known as 'optimism') relative to the actual performances for external populations. Existing statistical evidence and guidelines suggest that three bootstrap-based bias correction methods are preferable in practice, namely Harrell's bias correction and the .632 and .632+ estimators. Although Harrell's method has been widely adopted in clinical studies, simulation-based evidence indicates that the .632+ estimator may perform better than the other two methods. However, these methods' actual comparative effectiveness is still unclear due to limited numerical evidence.

METHODS

We conducted extensive simulation studies to compare the effectiveness of these three bootstrapping methods, particularly using various model building strategies: conventional logistic regression, stepwise variable selections, Firth's penalized likelihood method, ridge, lasso, and elastic-net regression. We generated the simulation data based on the Global Utilization of Streptokinase and Tissue plasminogen activator for Occluded coronary arteries (GUSTO-I) trial Western dataset and considered how event per variable, event fraction, number of candidate predictors, and the regression coefficients of the predictors impacted the performances. The internal validity of C-statistics was evaluated.

RESULTS

Under relatively large sample settings (roughly, events per variable ≥ 10), the three bootstrap-based methods were comparable and performed well. However, all three methods had biases under small sample settings, and the directions and sizes of biases were inconsistent. In general, Harrell's and .632 methods had overestimation biases when event fraction become lager. Besides, .632+ method had a slight underestimation bias when event fraction was very small. Although the bias of the .632+ estimator was relatively small, its root mean squared error (RMSE) was comparable or sometimes larger than those of the other two methods, especially for the regularized estimation methods.

CONCLUSIONS

In general, the three bootstrap estimators were comparable, but the .632+ estimator performed relatively well under small sample settings, except when the regularized estimation methods are adopted.

摘要

背景

多变量预测模型是基于患者多个特征提供综合诊断和预后算法的重要统计工具。它们的明显预测精度指标通常相对于外部人群的实际表现存在高估偏差（称为“乐观性”）。现有的统计证据和指南表明，在实践中，三种基于 bootstrap 的偏差校正方法更可取，即 Harrell 的偏差校正和.632 和.632+估计量。虽然 Harrell 方法已在临床研究中广泛采用，但基于模拟的证据表明，.632+估计量的性能可能优于其他两种方法。然而，由于数值证据有限，这些方法的实际比较效果仍不清楚。

方法

我们进行了广泛的模拟研究，以比较这三种引导法的有效性，特别是使用各种模型构建策略：传统的逻辑回归、逐步变量选择、Firth 的惩罚似然法、岭回归、lasso 和弹性网络回归。我们基于 Global Utilization of Streptokinase and Tissue plasminogen activator for Occluded coronary arteries（GUSTO-I）试验西方数据集生成模拟数据，并考虑了事件变量比、事件分数、候选预测因子数量以及预测因子的回归系数如何影响性能。内部 C 统计量的有效性进行了评估。

结果

在相对较大的样本设置（大致为每个变量的事件数≥10）下，三种基于 bootstrap 的方法具有可比性且表现良好。然而，在小样本设置下，所有三种方法都存在偏差，偏差的方向和大小不一致。一般来说，当事件分数变大时，Harrell 和.632 方法存在高估偏差。此外，当事件分数非常小时，.632+方法存在轻微的低估偏差。尽管.632+估计量的偏差相对较小，但它的均方根误差（RMSE）与其他两种方法相当，有时甚至更大，尤其是对于正则化估计方法。

结论

一般来说，三种引导估计量具有可比性，但.632+估计量在小样本设置下表现相对较好，除非采用正则化估计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbbb/7789544/4bc16ee6d5e4/12874_2020_1201_Fig1_HTML.jpg

相似文献

Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models.

BMC Med Res Methodol. 2021 Jan 7;21(1):9. doi: 10.1186/s12874-020-01201-w.

Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods.

Stat Med. 2021 Nov 20;40(26):5691-5701. doi: 10.1002/sim.9148. Epub 2021 Jul 24.

Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.

Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19.

Regression shrinkage methods for clinical prediction models do not guarantee improved performance: Simulation study.

Stat Methods Med Res. 2020 Nov;29(11):3166-3178. doi: 10.1177/0962280220921415. Epub 2020 May 13.

Firth's logistic regression with rare events: accurate effect estimates and predictions?

Stat Med. 2017 Jun 30;36(14):2302-2317. doi: 10.1002/sim.7273. Epub 2017 Mar 12.

To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets.

BMC Med Res Methodol. 2021 Sep 30;21(1):199. doi: 10.1186/s12874-021-01374-y.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Adaptive sample size determination for the development of clinical prediction models.

Diagn Progn Res. 2021 Mar 22;5(1):6. doi: 10.1186/s41512-021-00096-5.

Optimism Bias Correction in Omics Studies with Big Data: Assessment of Penalized Methods on Simulated Data.

OMICS. 2019 Apr;23(4):207-213. doi: 10.1089/omi.2018.0191. Epub 2019 Feb 22.

On estimation for accelerated failure time models with small or rare event survival data.

BMC Med Res Methodol. 2022 Jun 11;22(1):169. doi: 10.1186/s12874-022-01638-1.

引用本文的文献

Variants Predicted Poor Outcomes in Acute Myeloid Leukemia Patients with bZIP In-Frame Mutations.

Cancers (Basel). 2025 Jul 29;17(15):2494. doi: 10.3390/cancers17152494.

Evaluation of comorbidity measures for predicting mortality and revision surgery after elective primary shoulder replacement surgery based on data from the National Joint Registry and Hospital Episode Statistics for England: population based cohort study.

BMJ Med. 2025 Aug 10;4(1):e001283. doi: 10.1136/bmjmed-2024-001283. eCollection 2025.

Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models.

Stat Med. 2025 Aug;44(18-19):e70203. doi: 10.1002/sim.70203.

Machine learning approaches to identify neonates and young children at risk for postdischarge mortality in Dar es Salaam, Tanzania and Monrovia, Liberia.

BMJ Paediatr Open. 2025 Jun 19;9(1):e003547. doi: 10.1136/bmjpo-2025-003547.

Scoring System-Based Approach for Positive Intracoronary Acetylcholine Provocation Tests: The Original and Modified ABCD Scores.

JACC Adv. 2025 May 14;4(6 Pt 1):101790. doi: 10.1016/j.jacadv.2025.101790.

Development and Internal Validation of a Risk Assessment Tool to Identify Neonates at Risk for 60-Day Hospital Readmission in Dar es Salaam, Tanzania, and Monrovia, Liberia.

Am J Trop Med Hyg. 2025 Apr 1;112(6):1378-1384. doi: 10.4269/ajtmh.24-0648. Print 2025 Jun 4.

Tumor ADC value predicts outcome and yields refined prognostication in uterine cervical cancer.

Cancer Imaging. 2025 Feb 28;25(1):23. doi: 10.1186/s40644-025-00828-6.

The potential of thermal imaging as an early predictive biomarker of radiation dermatitis during radiotherapy for head and neck cancer: a prospective study.

BMC Cancer. 2025 Feb 20;25(1):309. doi: 10.1186/s12885-025-13734-8.

The constrained-disorder principle defines the functions of systems in nature.

Front Netw Physiol. 2024 Dec 18;4:1361915. doi: 10.3389/fnetp.2024.1361915. eCollection 2024.

Derivation of a risk-adjusted model to predict antibiotic prescribing among hospitalists in an academic healthcare network.

Antimicrob Steward Healthc Epidemiol. 2024 Oct 7;4(1):e163. doi: 10.1017/ash.2024.422. eCollection 2024.

本文引用的文献

Sample size for binary logistic prediction models: Beyond events per variable criteria.

Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

Logistic Regression Diagnostics: Understanding How Well a Model Predicts Outcomes.

JAMA. 2017 Mar 14;317(10):1068-1069. doi: 10.1001/jama.2016.20441.

Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data.

BMC Med Res Methodol. 2017 Feb 23;17(1):33. doi: 10.1186/s12874-017-0313-9.

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.

Ann Intern Med. 2015 Jan 6;162(1):W1-73. doi: 10.7326/M14-0698.

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.

Ann Intern Med. 2015 Jan 6;162(1):55-63. doi: 10.7326/M14-0697.

pROC: an open-source package for R and S+ to analyze and compare ROC curves.

BMC Bioinformatics. 2011 Mar 17;12:77. doi: 10.1186/1471-2105-12-77.

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

The performance of risk prediction models.

Biom J. 2008 Aug;50(4):457-79. doi: 10.1002/bimj.200810443.

Relaxing the rule of ten events per variable in logistic and Cox regression.

Am J Epidemiol. 2007 Mar 15;165(6):710-8. doi: 10.1093/aje/kwk052. Epub 2006 Dec 20.

Simplifying a prognostic model: a simulation study based on clinical data.

Stat Med. 2002 Dec 30;21(24):3803-22. doi: 10.1002/sim.1422.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 Bootstrap 的校正方法在多变量临床预测模型构建中的校正效能再评价。

Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献