Suppr超能文献

对阿尔茨海默病神经影像学倡议 (ADNI) 数据集的两两相关性分析显示出显著的特征相关性。

Pairwise Correlation Analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation.

机构信息

Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40536, USA.

Department of Biology, Brigham Young University, Provo, UT 84602, USA.

出版信息

Genes (Basel). 2021 Oct 21;12(11):1661. doi: 10.3390/genes12111661.

Abstract

The Alzheimer's Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer's disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset and progression. While using a variety of biomarkers is essential to AD research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models. Therefore, we used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. We found that 93.457% of biomarkers, 92.549% of the gene expression values, and 100% of MRI features were strongly correlated with at least one other feature in ADNI based on our Bonferroni corrected α (-value ≤ 1.40754 × 10). We provide a comprehensive mapping of all ADNI biomarkers to highly correlated features within the dataset. Additionally, we show that significant correlation within the ADNI dataset should be resolved before performing bulk data analyses, and we provide recommendations to address these issues. We anticipate that these recommendations and resources will help guide researchers utilizing the ADNI dataset to increase model performance and reduce the cost and complexity of their analyses.

摘要

阿尔茨海默病神经影像学倡议 (ADNI) 包含大量来自阿尔茨海默病 (AD) 病例和对照者的患者测量数据(例如磁共振成像 [MRI]、生物标志物、RNA 表达等),这些数据最近已被机器学习算法用于评估 AD 的发病和进展。虽然使用各种生物标志物对于 AD 研究至关重要,但高度相关的输入特征会显著降低机器学习模型的泛化能力和性能。此外,冗余特征不必要地增加了训练预测模型所需的计算时间和资源。因此,我们使用了 49,288 个生物标志物和 793,600 个提取的 MRI 特征来评估 ADNI 数据集中的特征相关性,以确定在使用这些数据进行大规模分析时,这个问题可能会对分析产生多大的影响。我们发现,基于我们的 Bonferroni 校正 α 值(-value ≤ 1.40754 × 10),93.457%的生物标志物、92.549%的基因表达值和 100%的 MRI 特征与 ADNI 中的至少一个其他特征具有很强的相关性。我们提供了 ADNI 生物标志物与数据集中高度相关特征的全面映射。此外,我们还表明,在进行批量数据分析之前,应解决 ADNI 数据集中的显著相关性问题,并提供了解决这些问题的建议。我们预计,这些建议和资源将有助于指导利用 ADNI 数据集的研究人员提高模型性能,并降低他们分析的成本和复杂性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26f9/8619902/5b695480c26b/genes-12-01661-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验