Suppr超能文献

基于聚类进化随机森林的阿尔茨海默病多模态数据分析。

Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest.

出版信息

IEEE J Biomed Health Inform. 2020 Oct;24(10):2973-2983. doi: 10.1109/JBHI.2020.2973324. Epub 2020 Feb 11.

Abstract

Alzheimer's disease (AD) has become a severe medical challenge. Advances in technologies produced high-dimensional data of different modalities including functional magnetic resonance imaging (fMRI) and single nucleotide polymorphism (SNP). Understanding the complex association patterns among these heterogeneous and complementary data is of benefit to the diagnosis and prevention of AD. In this paper, we apply the appropriate correlation analysis method to detect the relationships between brain regions and genes, and propose "brain region-gene pairs" as the multimodal features of the sample. In addition, we put forward a novel data analysis method from technology aspect, cluster evolutionary random forest (CERF), which is suitable for "brain region-gene pairs". The idea of clustering evolution is introduced to improve the generalization performance of random forest which is constructed by randomly selecting samples and sample features. Through hierarchical clustering of decision trees in random forest, the decision trees with higher similarity are clustered into one class, and the decision trees with the best performance are retained to enhance the diversity between decision trees. Furthermore, based on CERF, we integrate feature construction, feature selection and sample classification to find the optimal combination of different methods, and design a comprehensive diagnostic framework for AD. The framework is validated by the samples with both fMRI and SNP data from ADNI. The results show that we can effectively identify AD patients and discover some brain regions and genes associated with AD significantly based on this framework. These findings are conducive to the clinical treatment and prevention of AD.

摘要

阿尔茨海默病(AD)已成为严重的医学挑战。技术的进步产生了不同模态的高维数据,包括功能磁共振成像(fMRI)和单核苷酸多态性(SNP)。理解这些异质和互补数据之间的复杂关联模式有助于 AD 的诊断和预防。在本文中,我们应用适当的相关分析方法来检测脑区和基因之间的关系,并提出“脑区-基因对”作为样本的多模态特征。此外,我们从技术方面提出了一种新的数据分析方法,即聚类进化随机森林(CERF),它适用于“脑区-基因对”。聚类进化的思想被引入到随机森林中来提高随机森林的泛化性能,随机森林是通过随机选择样本和样本特征来构建的。通过对随机森林中的决策树进行层次聚类,将具有较高相似性的决策树聚类到一个类别中,并保留性能最佳的决策树,以增强决策树之间的多样性。此外,基于 CERF,我们整合特征构建、特征选择和样本分类,以找到不同方法的最佳组合,并设计一个用于 AD 的综合诊断框架。该框架通过来自 ADNI 的既有 fMRI 又有 SNP 数据的样本进行验证。结果表明,我们可以基于该框架有效地识别 AD 患者,并发现一些与 AD 显著相关的脑区和基因。这些发现有助于 AD 的临床治疗和预防。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验