Suppr超能文献

基于机器学习的 MRI 数据调和功效:36 个数据集的多中心研究。

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.

机构信息

Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy.

"Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy.

出版信息

Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7.

Abstract

Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.

摘要

从多个站点汇集公开可用的 MRI 数据,可以汇集大量的受试者,增加统计能力,并通过机器学习技术促进数据重用。多中心数据的协调对于减少与数据中非生物学来源的变异性相关的混杂效应是必要的。然而,当将其应用于机器学习之前的整个数据集时,协调会导致数据泄露,因为训练集之外的信息可能会影响模型构建,并可能错误地高估性能。我们提出了 1)协调数据的功效的测量;2)协调器转换器,即 ComBat 协调的实现,允许将其封装在机器学习管道的预处理步骤中,通过设计避免数据泄露。我们使用来自 36 个站点的 1740 名健康受试者的大脑 T 加权 MRI 数据测试了这些工具。协调后,去除或减少了站点效应,我们还展示了从 MRI 数据预测个体年龄时的数据泄露效应,这突出表明,将协调器转换器引入机器学习管道可以通过设计避免数据泄露。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/65d5b1e93f66/41597_2023_2421_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验