Jacob Christine, Brasier Noé, Laurenzi Emanuele, Heuss Sabina, Mougiakakou Stavroula-Georgia, Cöltekin Arzu, Peter Marc K
FHNW, University of Applied Sciences and Arts Northwestern Switzerland, Windisch, Switzerland.
Institute of Translational Medicine, Department of Health Science and Technology, ETH Zurich, Zurich, Switzerland.
J Med Internet Res. 2025 Feb 5;27:e67485. doi: 10.2196/67485.
Artificial intelligence (AI) has the potential to revolutionize health care by enhancing both clinical outcomes and operational efficiency. However, its clinical adoption has been slower than anticipated, largely due to the absence of comprehensive evaluation frameworks. Existing frameworks remain insufficient and tend to emphasize technical metrics such as accuracy and validation, while overlooking critical real-world factors such as clinical impact, integration, and economic sustainability. This narrow focus prevents AI tools from being effectively implemented, limiting their broader impact and long-term viability in clinical practice.
This study aimed to create a framework for assessing AI in health care, extending beyond technical metrics to incorporate social and organizational dimensions. The framework was developed by systematically reviewing, analyzing, and synthesizing the evaluation criteria necessary for successful implementation, focusing on the long-term real-world impact of AI in clinical practice.
A search was performed in July 2024 across the PubMed, Cochrane, Scopus, and IEEE Xplore databases to identify relevant studies published in English between January 2019 and mid-July 2024, yielding 3528 results, among which 44 studies met the inclusion criteria. The systematic review followed PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines and the Cochrane Handbook for Systematic Reviews. Data were analyzed using NVivo through thematic analysis and narrative synthesis to identify key emergent themes in the studies.
By synthesizing the included studies, we developed a framework that goes beyond the traditional focus on technical metrics or study-level methodologies. It integrates clinical context and real-world implementation factors, offering a more comprehensive approach to evaluating AI tools. With our focus on assessing the long-term real-world impact of AI technologies in health care, we named the framework AI for IMPACTS. The criteria are organized into seven key clusters, each corresponding to a letter in the acronym: (1) I-integration, interoperability, and workflow; (2) M-monitoring, governance, and accountability; (3) P-performance and quality metrics; (4) A-acceptability, trust, and training; (5) C-cost and economic evaluation; (6) T-technological safety and transparency; and (7) S-scalability and impact. These are further broken down into 28 specific subcriteria.
The AI for IMPACTS framework offers a holistic approach to evaluate the long-term real-world impact of AI tools in the heterogeneous and challenging health care context and lays the groundwork for further validation through expert consensus and testing of the framework in real-world health care settings. It is important to emphasize that multidisciplinary expertise is essential for assessment, yet many assessors lack the necessary training. In addition, traditional evaluation methods struggle to keep pace with AI's rapid development. To ensure successful AI integration, flexible, fast-tracked assessment processes and proper assessor training are needed to maintain rigorous standards while adapting to AI's dynamic evolution.
reviewregistry1859; https://tinyurl.com/ysn2d7sh.
人工智能(AI)有潜力通过改善临床结果和运营效率来彻底改变医疗保健行业。然而,其在临床中的应用速度比预期要慢,这主要是由于缺乏全面的评估框架。现有的框架仍然不够完善,往往侧重于技术指标,如准确性和验证,而忽视了关键的现实世界因素,如临床影响、整合和经济可持续性。这种狭隘的关注点阻碍了人工智能工具的有效实施,限制了它们在临床实践中的更广泛影响和长期可行性。
本研究旨在创建一个评估医疗保健领域人工智能的框架,超越技术指标,纳入社会和组织层面。该框架是通过系统地审查、分析和综合成功实施所需的评估标准而制定的,重点关注人工智能在临床实践中的长期现实世界影响。
2024年7月在PubMed、Cochrane、Scopus和IEEE Xplore数据库中进行了检索,以识别2019年1月至2024年7月中旬期间发表的英文相关研究,共获得3528条结果,其中44项研究符合纳入标准。系统评价遵循PRISMA(系统评价和Meta分析的首选报告项目)指南和Cochrane系统评价手册。使用NVivo通过主题分析和叙述性综合对数据进行分析,以确定研究中的关键新兴主题。
通过综合纳入的研究,我们开发了一个框架,该框架超越了传统上对技术指标或研究层面方法的关注。它整合了临床背景和现实世界的实施因素,为评估人工智能工具提供了更全面的方法。由于我们专注于评估人工智能技术在医疗保健中的长期现实世界影响,我们将该框架命名为IMPACTS人工智能。这些标准被组织成七个关键集群,每个集群对应首字母缩写中的一个字母:(1)I-整合、互操作性和工作流程;(2)M-监测、治理和问责制;(3)P-性能和质量指标;(4)A-可接受性、信任和培训;(5)C-成本和经济评估;(6)T-技术安全性和透明度;(7)S-可扩展性和影响。这些进一步细分为28个具体的子标准。
IMPACTS人工智能框架提供了一种整体方法,用于评估人工智能工具在异质且具有挑战性的医疗保健环境中的长期现实世界影响,并为通过专家共识和在现实世界医疗保健环境中对该框架进行测试来进一步验证奠定了基础。需要强调的是,多学科专业知识对于评估至关重要,但许多评估人员缺乏必要的培训。此外,传统评估方法难以跟上人工智能的快速发展。为确保人工智能的成功整合,需要灵活、快速的评估流程和适当的评估人员培训,以在适应人工智能动态发展的同时保持严格的标准。
reviewregistry1859;https://tinyurl.com/ysn2d7sh。