Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. The Techna Institute for the Advancement of Technology for Health, Toronto, Ontario, Canada. Author to whom any correspondence should be addressed.
Phys Med Biol. 2020 Feb 5;65(3):035017. doi: 10.1088/1361-6560/ab63ba.
Quality assurance of data prior to use in automated pipelines and image analysis would assist in safeguarding against biases and incorrect interpretation of results. Automation of quality assurance steps would further improve robustness and efficiency of these methods, motivating widespread adoption of techniques. Previous work by our group demonstrated the ability of convolutional neural networks (CNN) to efficiently classify head and neck (H&N) computed-tomography (CT) images for the presence of dental artifacts (DA) that obscure visualization of structures and the accuracy of Hounsfield units. In this work we demonstrate the generalizability of our previous methodology by validating CNNs on six external datasets, and the potential benefits of transfer learning with fine-tuning on CNN performance. 2112 H&N CT images from seven institutions were scored as DA positive or negative. 1538 images from a single institution were used to train three CNNs with resampling grid sizes of 64, 128 and 256. The remaining six external datasets were used in five-fold cross-validation with a data split of 20% training/fine-tuning and 80% validation. The three pre-trained models were each validated using the five-folds of the six external datasets. The pre-trained models also underwent transfer learning with fine-tuning using the 20% training/fine-tuning data, and validated using the corresponding validation datasets. The highest micro-averaged AUC for our pre-trained models across all external datasets occurred with a resampling grid of 256 (AUC = 0.91 ± 0.01). Transfer learning with fine-tuning improved generalizability when utilizing a resampling grid of 256 to a micro-averaged AUC of 0.92 ± 0.01. Despite these promising results, transfer learning did not improve AUC when utilizing small resampling grids or small datasets. Our work demonstrates the potential of our previously developed automated quality assurance methods to generalize to external datasets. Additionally, we showed that transfer learning with fine-tuning using small portions of external datasets can be used to fine-tune models for improved performance when large variations in images are present.
在自动化管道和图像分析中使用数据之前进行质量保证,将有助于防止偏差和对结果的错误解释。质量保证步骤的自动化将进一步提高这些方法的健壮性和效率,从而促使广泛采用这些技术。我们小组之前的工作证明了卷积神经网络 (CNN) 能够有效地对头颈部 (H&N) 计算机断层扫描 (CT) 图像进行分类,以确定是否存在牙科伪影 (DA),这些伪影会遮挡结构的可视化效果和亨氏单位的准确性。在这项工作中,我们通过在六个外部数据集上验证 CNN,展示了我们之前方法的通用性,以及通过微调在 CNN 性能上进行迁移学习的潜在好处。来自七个机构的 2112 张 H&N CT 图像被标记为 DA 阳性或阴性。来自单个机构的 1538 张图像用于训练具有 64、128 和 256 重采样网格大小的三个 CNN。其余六个外部数据集在 20%的训练/微调数据和 80%的验证数据的 5 折交叉验证中使用。使用六个外部数据集的五折数据验证了三个预训练模型。预训练模型还进行了使用 20%的训练/微调数据进行微调的迁移学习,并使用相应的验证数据集进行验证。在所有外部数据集上,我们的预训练模型的最高微平均 AUC 出现在重采样网格为 256 时(AUC=0.91±0.01)。使用 256 重采样网格进行微调的迁移学习,可将泛化能力提高到微平均 AUC 为 0.92±0.01。尽管取得了这些有希望的结果,但当使用小重采样网格或小数据集时,迁移学习并没有提高 AUC。我们的工作表明,我们之前开发的自动化质量保证方法具有推广到外部数据集的潜力。此外,我们还表明,使用小部分外部数据集进行微调的迁移学习可以用于微调模型,以提高存在较大图像差异时的性能。