Mansour Omar, Paik Julie M, Wyss Richard, Mastrorilli Julianna M, Bessette Lily Gui, Lu Zhigang, Tsacogianis Theodore, Lin Kueiyu Joshua
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Renal Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Clin Epidemiol. 2023 Mar 8;15:299-307. doi: 10.2147/CLEP.S397020. eCollection 2023.
Because chronic kidney disease (CKD) is often under-coded as a diagnosis in claims data, we aimed to develop claims-based prediction models for CKD phenotypes determined by laboratory results in electronic health records (EHRs).
We linked EHR from two networks (used as training and validation cohorts, respectively) with Medicare claims data. The study cohort included individuals ≥65 years with a valid serum creatinine result in the EHR from 2007 to 2017, excluding those with end-stage kidney disease or on dialysis. We used LASSO regression to select among 134 predictors for predicting continuous estimated glomerular filtration rate (eGFR). We assessed the model performance when predicting eGFR categories of <60, <45, <30 mL/min/1.73m in terms of area under the receiver operating curves (AUC).
The model training cohort included 117,476 patients (mean age 74.8 years, female 58.2%) and the validation cohort included 56,744 patients (mean age 73.8 years, female 59.6%). In the validation cohort, the AUC of the primary model (with 113 predictors and an adjusted of 0.35) for predicting eGFR <60, eGFR<45, and eGFR <30 mL/min/1.73m categories was 0.81, 0.88, and 0.92, respectively, and the corresponding positive predictive values for these 3 phenotypes were 0.80 (95% confidence interval: 0.79, 0.81), 0.79 (0.75, 0.84), and 0.38 (0.30, 0.45), respectively.
We developed a claims-based model to determine clinical phenotypes of CKD stages defined by eGFR values. Researchers without access to laboratory results can use the model-predicted phenotypes as a proxy clinical endpoint or confounder and to enhance subgroup effect assessment.
由于慢性肾脏病(CKD)在理赔数据中作为诊断的编码往往不足,我们旨在开发基于理赔数据的预测模型,用于根据电子健康记录(EHR)中的实验室结果确定CKD表型。
我们将来自两个网络的EHR(分别用作训练队列和验证队列)与医疗保险理赔数据相链接。研究队列包括2007年至2017年在EHR中有有效血清肌酐结果的65岁及以上个体,排除终末期肾病或接受透析的个体。我们使用LASSO回归从134个预测因素中选择用于预测连续估计肾小球滤过率(eGFR)的因素。我们根据受试者工作特征曲线下面积(AUC)评估预测eGFR类别<60、<45、<30 mL/ min/1.73m²时的模型性能。
模型训练队列包括117476例患者(平均年龄74.8岁,女性占58.2%),验证队列包括56744例患者(平均年龄73.8岁,女性占59.6%)。在验证队列中,预测eGFR<60、eGFR<45和eGFR<30 mL/ min/1.73m²类别的主要模型(有113个预测因素,调整后R²为0.35)的AUC分别为0.81、0.88和0.92,这3种表型的相应阳性预测值分别为0.80(95%置信区间:0.79,0.81)、0.79(0.75,0.84)和0.38(0.30,0.45)。
我们开发了一种基于理赔数据的模型来确定由eGFR值定义的CKD阶段的临床表型。无法获取实验室结果的研究人员可以使用模型预测的表型作为替代临床终点或混杂因素,并加强亚组效应评估。