基于机器学习的 MRI 数据调和功效：36 个数据集的多中心研究。

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.

机构信息

Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy.

"Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy.

出版信息

Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7.

DOI:10.1038/s41597-023-02421-7

PMID:38263181

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10805868/

Abstract

Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.

摘要

从多个站点汇集公开可用的 MRI 数据，可以汇集大量的受试者，增加统计能力，并通过机器学习技术促进数据重用。多中心数据的协调对于减少与数据中非生物学来源的变异性相关的混杂效应是必要的。然而，当将其应用于机器学习之前的整个数据集时，协调会导致数据泄露，因为训练集之外的信息可能会影响模型构建，并可能错误地高估性能。我们提出了 1）协调数据的功效的测量；2）协调器转换器，即 ComBat 协调的实现，允许将其封装在机器学习管道的预处理步骤中，通过设计避免数据泄露。我们使用来自 36 个站点的 1740 名健康受试者的大脑 T 加权 MRI 数据测试了这些工具。协调后，去除或减少了站点效应，我们还展示了从 MRI 数据预测个体年龄时的数据泄露效应，这突出表明，将协调器转换器引入机器学习管道可以通过设计避免数据泄露。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/65d5b1e93f66/41597_2023_2421_Fig1_HTML.jpg

相似文献

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.

Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7.

Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies.

Sci Rep. 2020 Jun 24;10(1):10248. doi: 10.1038/s41598-020-66110-w.

Effect of data harmonization of multicentric dataset in ASD/TD classification.

Brain Inform. 2023 Nov 25;10(1):32. doi: 10.1186/s40708-023-00210-x.

Comparison of traveling-subject and ComBat harmonization methods for assessing structural brain characteristics.

Hum Brain Mapp. 2021 Nov;42(16):5278-5287. doi: 10.1002/hbm.25615. Epub 2021 Aug 17.

A transfer learning approach to facilitate ComBat-based harmonization of multicentre radiomic features in new datasets.

PLoS One. 2021 Jul 1;16(7):e0253653. doi: 10.1371/journal.pone.0253653. eCollection 2021.

The impact of harmonization on radiomic features in Parkinson's disease and healthy controls: A multicenter study.

Front Neurosci. 2022 Oct 10;16:1012287. doi: 10.3389/fnins.2022.1012287. eCollection 2022.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

ComBat Harmonization for MRI Radiomics: Impact on Nonbinary Tissue Classification by Machine Learning.

Invest Radiol. 2023 Sep 1;58(9):697-701. doi: 10.1097/RLI.0000000000000970.

Harmonization of diffusion MRI data sets with adaptive dictionary learning.

Hum Brain Mapp. 2020 Nov;41(16):4478-4499. doi: 10.1002/hbm.25117. Epub 2020 Aug 26.

A three-dimensional deep learning model for inter-site harmonization of structural MR images of the brain: Extensive validation with a multicenter dataset.

Heliyon. 2023 Nov 23;9(12):e22647. doi: 10.1016/j.heliyon.2023.e22647. eCollection 2023 Dec.

引用本文的文献

Overcoming Site Variability in Multisite fMRI Studies: an Autoencoder Framework for Enhanced Generalizability of Machine Learning Models.

Neuroinformatics. 2025 Sep 2;23(3):46. doi: 10.1007/s12021-025-09746-1.

Unlocking the potential of radiomics in identifying fibrosing and inflammatory patterns in interstitial lung disease.

Radiol Med. 2025 Aug 22. doi: 10.1007/s11547-025-02067-y.

HeteroMRI: Robust white matter abnormality classification across multi-scanner MRI data.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf092.

Current challenges and future directions for brain age prediction in children and adolescents.

Nat Commun. 2025 Aug 20;16(1):7771. doi: 10.1038/s41467-025-63222-7.

An evaluation of image-based and statistical techniques for harmonizing brain volume measurements.

Imaging Neurosci (Camb). 2025 Jul 14;3. doi: 10.1162/IMAG.a.73. eCollection 2025.

Superpixel-ComBat modeling: A joint approach for harmonization and characterization of inter-scanner variability in T1-weighted images.

Imaging Neurosci (Camb). 2024 Oct 3;2. doi: 10.1162/imag_a_00306. eCollection 2024.

Age- and Sex-Specific Cerebral Blood Flow Atlases for Healthy Brain Across the Lifespan.

Sci Data. 2025 Jul 9;12(1):1169. doi: 10.1038/s41597-025-05406-w.

Lifespan reference curves for harmonizing multi-site regional brain white matter metrics from diffusion MRI.

Sci Data. 2025 May 6;12(1):748. doi: 10.1038/s41597-025-05028-2.

A critical assessment of artificial intelligence in magnetic resonance imaging of cancer.

Npj Imaging. 2025;3(1):15. doi: 10.1038/s44303-025-00076-0. Epub 2025 Apr 9.

Artificial Intelligence Is Brittle: We Need to Do Better.

Radiol Artif Intell. 2025 May;7(3):e250081. doi: 10.1148/ryai.250081.

本文引用的文献

ABCD_Harmonizer: An Open-source Tool for Mapping and Controlling for Scanner Induced Variance in the Adolescent Brain Cognitive Development Study.

Neuroinformatics. 2023 Apr;21(2):323-337. doi: 10.1007/s12021-023-09624-8. Epub 2023 Mar 20.

Feasibility of radiomic feature harmonization for pooling of [F]FET or [F]GE-180 PET images of gliomas.

Z Med Phys. 2023 Feb;33(1):91-102. doi: 10.1016/j.zemedi.2022.12.005. Epub 2023 Jan 27.

Age-associated sex and asymmetry differentiation in hemispheric and lobar cortical ribbon complexity across adulthood: A UK Biobank imaging study.

Hum Brain Mapp. 2023 Jan;44(1):49-65. doi: 10.1002/hbm.26076. Epub 2022 Sep 15.

Sample size requirement for achieving multisite harmonization using structural brain MRI features.

Neuroimage. 2022 Dec 1;264:119768. doi: 10.1016/j.neuroimage.2022.119768. Epub 2022 Nov 24.

The impact of harmonization on radiomic features in Parkinson's disease and healthy controls: A multicenter study.

Front Neurosci. 2022 Oct 10;16:1012287. doi: 10.3389/fnins.2022.1012287. eCollection 2022.

Unraveling schizophrenia replicable functional connectivity disruption patterns across sites.

Hum Brain Mapp. 2023 Jan;44(1):156-169. doi: 10.1002/hbm.26108. Epub 2022 Oct 12.

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images.

Sci Data. 2022 Sep 22;9(1):580. doi: 10.1038/s41597-022-01618-6.

The alterations of brain functional connectivity networks in major depressive disorder detected by machine learning through multisite rs-fMRI data.

Behav Brain Res. 2022 Oct 28;435:114058. doi: 10.1016/j.bbr.2022.114058. Epub 2022 Aug 20.

Sexual dimorphism in the relationship between brain complexity, volume and general intelligence (g): a cross-cohort study.

Sci Rep. 2022 Jun 30;12(1):11025. doi: 10.1038/s41598-022-15208-4.

Multi-site harmonization of MRI data uncovers machine-learning discrimination capability in barely separable populations: An example from the ABIDE dataset.

Neuroimage Clin. 2022;35:103082. doi: 10.1016/j.nicl.2022.103082. Epub 2022 Jun 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的 MRI 数据调和功效：36 个数据集的多中心研究。

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献