Suppr超能文献

阴道脱垂诊断中的缺失数据插补、预测和特征选择。

Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse.

机构信息

Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, 519087, China.

Department of Gynecology and Obstetrics, West China Second University Hospital, Sichuan University, Chengdu, 610064, China.

出版信息

BMC Med Res Methodol. 2023 Nov 6;23(1):259. doi: 10.1186/s12874-023-02079-0.

Abstract

BACKGROUND

Data loss often occurs in the collection of clinical data. Directly discarding the incomplete sample may lead to low accuracy of medical diagnosis. A suitable data imputation method can help researchers make better use of valuable medical data.

METHODS

In this paper, five popular imputation methods including mean imputation, expectation-maximization (EM) imputation, K-nearest neighbors (KNN) imputation, denoising autoencoders (DAE) and generative adversarial imputation nets (GAIN) are employed on an incomplete clinical data with 28,274 cases for vaginal prolapse prediction. A comprehensive comparison study for the performance of these methods has been conducted through certain classification criteria. It is shown that the prediction accuracy can be greatly improved by using the imputed data, especially by GAIN. To find out the important risk factors to this disease among a large number of candidate features, three variable selection methods: the least absolute shrinkage and selection operator (LASSO), the smoothly clipped absolute deviation (SCAD) and the broken adaptive ridge (BAR) are implemented in logistic regression for feature selection on the imputed datasets. In pursuit of our primary objective, which is accurate diagnosis, we employed diagnostic accuracy (classification accuracy) as a pivotal metric to assess both imputation and feature selection techniques. This assessment encompassed seven classifiers (logistic regression (LR) classifier, random forest (RF) classifier, support machine classifier (SVC), extreme gradient boosting (XGBoost) , LASSO classifier, SCAD classifier and Elastic Net classifier)enhancing the comprehensiveness of our evaluation.

RESULTS

The proposed framework imputation-variable selection-prediction is quite suitable to the collected vaginal prolapse datasets. It is observed that the original dataset is well imputed by GAIN first, and then 9 most significant features were selected using BAR from the original 67 features in GAIN imputed dataset, with only negligible loss in model prediction. BAR is superior to the other two variable selection methods in our tests.

CONCLUDES

Overall, combining the imputation, classification and variable selection, we achieve good interpretability while maintaining high accuracy in computer-aided medical diagnosis.

摘要

背景

在临床数据的收集过程中经常会发生数据丢失。直接丢弃不完整的样本可能会导致医疗诊断的准确性降低。合适的数据插补方法可以帮助研究人员更好地利用有价值的医疗数据。

方法

在本文中,我们使用了五种流行的插补方法,包括均值插补、期望最大化(EM)插补、K-最近邻(KNN)插补、去噪自编码器(DAE)和生成对抗插补网络(GAIN),对 28274 例阴道脱垂预测的不完整临床数据进行了处理。通过特定的分类标准,对这些方法的性能进行了全面比较研究。结果表明,使用插补数据可以大大提高预测准确性,特别是使用 GAIN。为了在大量候选特征中找到与该疾病相关的重要风险因素,我们在插补数据集上使用了三种变量选择方法:最小绝对值收缩和选择算子(LASSO)、平滑剪辑绝对偏差(SCAD)和断裂自适应岭(BAR)进行逻辑回归的特征选择。为了实现我们的主要目标,即准确诊断,我们采用诊断准确性(分类准确性)作为评估插补和特征选择技术的关键指标。这种评估涵盖了七种分类器(逻辑回归(LR)分类器、随机森林(RF)分类器、支持向量机(SVC)分类器、极端梯度提升(XGBoost)、LASSO 分类器、SCAD 分类器和弹性网络分类器),增强了我们评估的全面性。

结果

所提出的插补-变量选择-预测框架非常适合收集到的阴道脱垂数据集。结果表明,首先通过 GAIN 对原始数据集进行了很好的插补,然后使用 BAR 从 GAIN 插补数据集中的 67 个原始特征中选择了 9 个最重要的特征,模型预测的损失可以忽略不计。BAR 在我们的测试中优于其他两种变量选择方法。

结论

总的来说,通过结合插补、分类和变量选择,我们在保持计算机辅助医疗诊断的高准确性的同时,实现了良好的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5527/10629145/95b79a71b72f/12874_2023_2079_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验