Department of Communication Sciences and Disorders, California State University, Fullerton, CA, USA.
Public Health Graduate Program, University of California Merced, Merced, CA, USA.
Neuroinformatics. 2020 Jan;18(1):59-70. doi: 10.1007/s12021-019-09426-x.
Regional morphological analysis represents a crucial step in most neuroimaging studies. Results from brain segmentation techniques are intrinsically prone to certain degrees of variability, mainly as results of suboptimal segmentation. To reduce this inherent variability, the errors are often identified through visual inspection and then corrected (semi)manually. Identification and correction of incorrect segmentation could be very expensive for large-scale studies. While identification of the incorrect results can be done relatively fast even with manual inspection, the correction step is extremely time-consuming, as it requires training staff to perform laborious manual corrections. Here we frame the correction phase of this problem as a missing data problem. Instead of manually adjusting the segmentation outputs, our computational approach aims to derive accurate morphological measures by machine learning imputation. Data imputation techniques may be used to replace missing or incorrect region average values with carefully chosen imputed values, all of which are computed based on other available multivariate information. We examined our approach of correcting segmentation outputs on a cohort of 970 subjects, which were undergone an extensive, time-consuming, manual post-segmentation correction. A random forest imputation technique recovered the gold standard results with a significant accuracy (r = 0.93, p < 0.0001; when 30% of the segmentations were considered incorrect in a non-random fashion). The random forest technique proved to be most effective for big data studies (N > 250).
区域形态分析是大多数神经影像学研究的关键步骤。脑分割技术的结果本质上容易受到一定程度的可变性的影响,主要是由于分割不理想。为了减少这种固有变异性,错误通常通过目视检查来识别,然后进行(半)手动校正。对于大规模研究来说,识别和纠正不正确的分割可能非常昂贵。虽然即使通过手动检查也可以相对快速地识别不正确的结果,但校正步骤非常耗时,因为需要培训人员进行费力的手动校正。在这里,我们将该问题的校正阶段框定为缺失数据问题。我们的计算方法不是手动调整分割输出,而是旨在通过机器学习插补来得出准确的形态学度量。数据插补技术可用于用精心选择的插补值替换缺失或不正确的区域平均值,所有这些值都是基于其他可用的多元信息计算得出的。我们在一个由 970 名受试者组成的队列上检查了我们的校正分割输出的方法,这些受试者经历了广泛的、耗时的、手动的分割后校正。随机森林插补技术以显著的准确性(r=0.93,p<0.0001;当以非随机方式考虑 30%的分割不正确时)恢复了金标准结果。随机森林技术对于大数据研究(N>250)最有效。