Suppr超能文献

基于索赔的肺动脉高压患者识别算法:决策规则与机器学习方法的比较。

Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches.

机构信息

Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute Boston MA.

Computational Health Informatics Program Boston Children's Hospital Boston MA.

出版信息

J Am Heart Assoc. 2020 Oct 20;9(19):e016648. doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.

Abstract

Background Real-world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts-a crucial first step underpinning the validity of research results-remains a challenge. We developed and evaluated claims-based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state-of-the-art machine-learning approaches. Methods and Results We analyzed an electronic health record-Medicare linked database from two large academic tertiary care hospitals (years 2007-2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients' demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine-learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule-based algorithm-having ≥3 PH-related healthcare encounters and having undergone right heart catheterization-attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine-learning algorithms outperformed the most optimal rule-based algorithm (<0.001). A model derived from the random forest algorithm achieved an area under the receiver operating characteristic curve of 0.88 (sensitivity, 0.87; specificity, 0.70), and gradient boosting machine achieved comparable results (area under the receiver operating characteristic curve, 0.85; sensitivity, 0.87; specificity, 0.70). Penalized lasso regression achieved an area under the receiver operating characteristic curve of 0.73 (sensitivity, 0.70; specificity, 0.68). Conclusions Research-grade case identification algorithms for PH can be derived and rigorously validated using machine-learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule-based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.

摘要

背景

真实世界的医疗保健数据是进行流行病学研究的重要资源。然而,准确识别患者队列——这是支撑研究结果有效性的关键第一步——仍然是一个挑战。我们开发并评估了基于索赔的肺动脉高压 (PH) 病例确定算法,将传统决策规则与最先进的机器学习方法进行了比较。

方法和结果

我们分析了来自两家大型学术三级保健医院的电子健康记录-医疗保险链接数据库(2007-2013 年)。审查电子健康记录图表,形成 PH 患者(n=386)和无 PH 患者(n=164)的金标准队列。使用医疗保险索赔中捕获的健康就诊数据(包括患者的人口统计学信息、诊断、药物和程序),我们开发并比较了两种用于识别 PH 患者的方法:决策规则和使用惩罚套索回归、随机森林和梯度提升机的机器学习算法。基于规则的最佳算法——有≥3 次与 PH 相关的医疗保健就诊经历且接受过右心导管检查——获得了 0.64 的受试者工作特征曲线下面积(敏感性,0.75;特异性,0.48)。所有 3 种机器学习算法的表现均优于基于规则的最佳算法(<0.001)。基于随机森林算法的模型获得了 0.88 的受试者工作特征曲线下面积(敏感性,0.87;特异性,0.70),梯度提升机也取得了类似的结果(受试者工作特征曲线下面积为 0.85;敏感性,0.87;特异性,0.70)。惩罚套索回归获得了 0.73 的受试者工作特征曲线下面积(敏感性,0.70;特异性,0.68)。

结论

可以使用机器学习算法来开发和严格验证用于 PH 的研究级病例识别算法。在已发表文献中常用的简单决策规则表现不佳;更复杂的基于规则的算法可能会解决该方法的局限性。通过使用经过验证的队列确定算法,使用索赔数据进行 PH 研究将得到极大加强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49bd/7792386/7eb96138c108/JAH3-9-e016648-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验