Suppr超能文献

通过联邦学习对非独立同分布数据进行药物发现的协同分析。

Collaborative analysis for drug discovery by federated learning on non-IID data.

机构信息

Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.

Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.

出版信息

Methods. 2023 Nov;219:1-7. doi: 10.1016/j.ymeth.2023.09.001. Epub 2023 Sep 9.

Abstract

With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.

摘要

随着大规模 QSAR(定量构效关系)数据集的日益普及,协作分析已成为药物发现的一种有前途的方法。传统的集中式分析通常将数据集中在中央服务器上进行训练,但面临数据隐私和安全等挑战。分布式分析(如联邦学习)通过在不共享原始数据的情况下实现协作模型训练提供了一种解决方案。然而,当本地设备中的训练数据是非独立同分布(non-IID)时,它可能会失败。在本文中,我们提出了一种使用联邦学习在非 IID 数据集上进行协作药物发现的新框架。我们通过在所有机构之间全局共享一小部分数据来解决在非 IID 数据上进行训练的困难。我们的框架允许多个机构在保护其各自数据隐私的同时,共同训练一个稳健的预测模型。我们利用联邦学习范例将模型训练过程分布在本地设备上,无需进行数据交换。在 15 个基准数据集上的实验结果表明,所提出的方法在尊重数据隐私的同时,达到了与集中式分析相当的预测准确性。此外,我们的框架还具有减少数据传输和增强可扩展性等优势,使其适合大规模协作药物发现工作。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验