Department of Pathology, Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Nat Med. 2022 Jan;28(1):154-163. doi: 10.1038/s41591-021-01620-2. Epub 2022 Jan 13.
Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge-the largest histopathology competition to date, joined by 1,290 developers-to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840-0.884) and 0.868 (95% CI, 0.835-0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials.
人工智能(AI)在前列腺癌活检诊断中显示出了前景。然而,这些结果仅限于个别研究,缺乏在多国环境中的验证。竞赛已被证明是医学影像创新的加速器,但由于缺乏可重复性和独立验证,其影响受到了阻碍。考虑到这一点,我们组织了 PANDA 挑战赛——迄今为止规模最大的组织病理学竞赛,有 1290 名开发人员参加,旨在利用 10616 张数字化前列腺活检样本,促进可重复使用的 AI 算法在格里森分级中的开发。我们验证了一组多样化的提交算法在独立的跨大陆队列中达到了病理学家级别的性能,这些算法对算法开发人员是完全盲目的。在来自美国和欧洲的外部验证集上,这些算法与专家泌尿病理学家达成了 0.862(二次加权 κ,95%置信区间(CI),0.840-0.884)和 0.868(95%CI,0.835-0.900)的一致性。各种算法方法在不同的患者群体、实验室和参考标准上实现了成功的概括,这证明了基于人工智能的格里森分级值得在前瞻性临床试验中进行评估。