Suppr超能文献

美乐蒂:在前所未有的规模上进行跨制药公司联邦学习,在不损害专有信息的情况下,实现 QSAR 的优势。

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information.

机构信息

Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium.

AstraZeneca R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.

出版信息

J Chem Inf Model. 2024 Apr 8;64(7):2331-2344. doi: 10.1021/acs.jcim.3c00799. Epub 2023 Aug 29.

Abstract

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.

摘要

联邦多方机器学习被吹捧为一种有吸引力和高效的方法,可以增加有效的训练数据量,从而提高模型的预测能力,特别是在生成训练数据资源密集型的情况下。在具有里程碑意义的 MELLODDY 项目中,实际上,每家制药公司都通过联邦学习实现了自身分类或回归模型的聚合改进。为此,他们利用了一种新颖的实现方法,在经过隐私和安全审计的平台上,在合作伙伴之间扩展了多任务学习。实验涉及一个前所未有的跨制药数据集中的 26 亿多个机密实验活动数据点,记录了 2100 多万个物理小分子和 4 万多个针对目标和次要药效学和药代动力学的测定。开发了适当的补充指标来评估联邦环境中的预测性能。除了在标记空间中提高预测性能外,结果还表明在联邦学习中具有扩展的适用域。包括通过单浓度高通量和成像测定产生的辅助数据在内的集体训练数据量的增加继续提高了预测性能,尽管回报趋于饱和。基于药代动力学和安全性面板测定的任务子集观察到了显著更高的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d9c/11005050/5233afd4aab7/ci3c00799_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验