Liu Xiaoxuan, Reigle James, Prasath V B Surya, Dhaliwal Jasbir
Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA.
Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA; Cincinnati Children's Hospital Medical Center, Division of Gastroenterology, Hepatology and Nutrition, USA.
Comput Biol Med. 2024 Mar;171:108093. doi: 10.1016/j.compbiomed.2024.108093. Epub 2024 Feb 1.
There has been an increase in the development of both machine learning (ML) and deep learning (DL) prediction models in Inflammatory Bowel Disease. We aim in this systematic review to assess the methodological quality and risk of bias of ML and DL IBD image-based prediction studies.
We searched three databases, PubMed, Scopus and Embase, to identify ML and DL diagnostic or prognostic predictive models using imaging data in IBD, to Dec 31, 2022. We restricted our search to include studies that primarily used conventional imaging data, were undertaken in human participants, and published in English. Two reviewers independently reviewed the abstracts. The methodological quality of the studies was determined, and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST).
Forty studies were included, thirty-nine developed diagnostic models. Seven studies utilized ML approaches, six were retrospective and none used multicenter data for model development. Thirty-three studies utilized DL approaches, ten were prospective, and twelve multicenter studies. Overall, all studies demonstrated high risk of bias. ML studies were evaluated in 4 domains all rated as high risk of bias: participants (6/7), predictors (1/7), outcome (3/7), and analysis (7/7), and DL studies evaluated in 3 domains: participants (24/33), outcome (10/33), and analysis (18/33). The majority of image-based studies used colonoscopy images.
The risk of bias was high in AI IBD image-based prediction models, owing to insufficient sample size, unreported missingness and lack of an external validation cohort. Models with a high risk of bias are unlikely to be generalizable and suitable for clinical implementation.
炎症性肠病中机器学习(ML)和深度学习(DL)预测模型的开发有所增加。在本系统评价中,我们旨在评估基于ML和DL的炎症性肠病图像预测研究的方法学质量和偏倚风险。
我们检索了三个数据库,即PubMed、Scopus和Embase,以识别截至2022年12月31日使用炎症性肠病成像数据的ML和DL诊断或预后预测模型。我们将检索范围限制为主要使用传统成像数据、在人类参与者中进行且以英文发表的研究。两名评审员独立评审摘要。确定研究的方法学质量,并使用偏倚评估预测风险工具(PROBAST)评估偏倚风险。
纳入40项研究,39项开发了诊断模型。7项研究采用ML方法,6项为回顾性研究,且均未使用多中心数据进行模型开发。33项研究采用DL方法,10项为前瞻性研究,12项为多中心研究。总体而言,所有研究均显示出高偏倚风险。ML研究在4个领域进行评估,均被评为高偏倚风险:参与者(6/7)、预测因素(1/7)、结局(3/7)和分析(7/7),DL研究在3个领域进行评估:参与者(24/33)、结局(10/33)和分析(18/33)。大多数基于图像的研究使用结肠镜检查图像。
基于人工智能的炎症性肠病图像预测模型的偏倚风险较高,原因是样本量不足、未报告缺失情况以及缺乏外部验证队列。偏倚风险高的模型不太可能具有普遍性,也不适合临床应用。