Sutton Reed T, Zai Ane Osmar R, Goebel Randolph, Baumgart Daniel C
Division of Gastroenterology, University of Alberta, 130 University Campus, Edmonton, AB, T6G 2X8, Canada.
Department of Computing Science, University of Alberta, Edmonton, AB, Canada.
Sci Rep. 2022 Feb 17;12(1):2748. doi: 10.1038/s41598-022-06726-2.
Endoscopic evaluation to reliably grade disease activity, detect complications including cancer and verification of mucosal healing are paramount in the care of patients with ulcerative colitis (UC); but this evaluation is hampered by substantial intra- and interobserver variability. Recently, artificial intelligence methodologies have been proposed to facilitate more objective, reproducible endoscopic assessment. In a first step, we compared how well several deep learning convolutional neural network architectures (CNNs) applied to a diverse subset of 8000 labeled endoscopic still images derived from HyperKvasir, the largest multi-class image and video dataset from the gastrointestinal tract available today. The HyperKvasir dataset includes 110,079 images and 374 videos and could (1) accurately distinguish UC from non-UC pathologies, and (2) inform the Mayo score of endoscopic disease severity. We grouped 851 UC images labeled with a Mayo score of 0-3, into an inactive/mild (236) and moderate/severe (604) dichotomy. Weights were initialized with ImageNet, and Grid Search was used to identify the best hyperparameters using fivefold cross-validation. The best accuracy (87.50%) and Area Under the Curve (AUC) (0.90) was achieved using the DenseNet121 architecture, compared to 72.02% and 0.50 by predicting the majority class ('no skill' model). Finally, we used Gradient-weighted Class Activation Maps (Grad-CAM) to improve visual interpretation of the model and take an explainable artificial intelligence approach (XAI).
在内镜检查中,可靠地评估疾病活动度、检测包括癌症在内的并发症以及验证黏膜愈合情况对于溃疡性结肠炎(UC)患者的治疗至关重要;但这种评估受到观察者内和观察者间显著变异性的阻碍。最近,有人提出使用人工智能方法来促进更客观、可重复的内镜评估。第一步,我们比较了几种深度学习卷积神经网络架构(CNN)应用于从HyperKvasir获取的8000张带标签的内镜静态图像的不同子集时的表现,HyperKvasir是目前可用的来自胃肠道的最大多类图像和视频数据集。HyperKvasir数据集包括110,079张图像和374个视频,它能够(1)准确区分UC与非UC病变,以及(2)反映内镜疾病严重程度的梅奥评分。我们将851张标注有梅奥评分0 - 3分的UC图像分为非活动/轻度(236张)和中度/重度(604张)两类。权重用ImageNet初始化,并使用网格搜索通过五折交叉验证来确定最佳超参数。与通过预测多数类别(“无技能”模型)得到的72.02%和0.50相比,使用DenseNet121架构实现了最佳准确率(87.50%)和曲线下面积(AUC)(0.90)。最后,我们使用梯度加权类激活映射(Grad-CAM)来改善模型的视觉解释,并采用可解释人工智能方法(XAI)。