Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
Med Image Anal. 2024 Feb;92:103059. doi: 10.1016/j.media.2023.103059. Epub 2023 Dec 7.
Artificial intelligence (AI) has a multitude of applications in cancer research and oncology. However, the training of AI systems is impeded by the limited availability of large datasets due to data protection requirements and other regulatory obstacles. Federated and swarm learning represent possible solutions to this problem by collaboratively training AI models while avoiding data transfer. However, in these decentralized methods, weight updates are still transferred to the aggregation server for merging the models. This leaves the possibility for a breach of data privacy, for example by model inversion or membership inference attacks by untrusted servers. Somewhat-homomorphically-encrypted federated learning (SHEFL) is a solution to this problem because only encrypted weights are transferred, and model updates are performed in the encrypted space. Here, we demonstrate the first successful implementation of SHEFL in a range of clinically relevant tasks in cancer image analysis on multicentric datasets in radiology and histopathology. We show that SHEFL enables the training of AI models which outperform locally trained models and perform on par with models which are centrally trained. In the future, SHEFL can enable multiple institutions to co-train AI models without forsaking data governance and without ever transmitting any decryptable data to untrusted servers.
人工智能(AI)在癌症研究和肿瘤学中有多种应用。然而,由于数据保护要求和其他监管障碍,大型数据集的有限可用性阻碍了 AI 系统的培训。联邦学习和群智学习通过协作训练 AI 模型而避免数据传输,是解决这个问题的可能方法。然而,在这些分散的方法中,权重更新仍被转移到聚合服务器以合并模型。这为数据隐私的泄露留下了可能性,例如通过模型反转或不受信任的服务器进行成员推理攻击。部分同态加密联邦学习(SHEFL)是解决这个问题的一种方法,因为只有加密的权重被转移,并且模型更新在加密空间中执行。在这里,我们在放射学和组织病理学的多中心数据集上展示了 SHEFL 在一系列临床相关的癌症图像分析任务中的首次成功实现。我们表明,SHEFL 可以训练出表现优于本地训练模型且与集中训练模型表现相当的 AI 模型。在未来,SHEFL 可以使多个机构共同训练 AI 模型,而不会放弃数据治理,也不会向不受信任的服务器传输任何可解密的数据。