Suppr超能文献

一种基于知识引导的预训练框架,用于改进分子表示学习。

A knowledge-guided pre-training framework for improving molecular representation learning.

机构信息

Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.

Research Center for Biological Computation, Zhejiang Province, Zhejiang Laboratory, 311100, Hangzhou, China.

出版信息

Nat Commun. 2023 Nov 21;14(1):7568. doi: 10.1038/s41467-023-43214-1.

Abstract

Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.

摘要

学习有效的分子特征表示方法,以促进分子性质预测,对于药物发现具有重要意义。最近,通过自监督学习技术对图神经网络(GNN)进行预训练引起了极大的兴趣,以克服分子性质预测中数据匮乏的挑战。然而,目前基于自监督学习的方法存在两个主要障碍:缺乏明确的自监督学习策略和 GNN 的容量有限。在这里,我们提出了知识引导的图转换器预训练(KPGT),这是一个自监督学习框架,可以缓解上述问题,并提供通用且稳健的分子表示。KPGT 框架集成了专门为分子图设计的图转换器和知识引导的预训练策略,以充分捕获分子的结构和语义知识。通过在 63 个数据集上进行广泛的计算测试,KPGT 在预测跨多个领域的分子性质方面表现出卓越的性能。此外,KPGT 在药物发现中的实际适用性已通过鉴定两种抗肿瘤靶标(造血祖细胞激酶 1(HPK1)和成纤维细胞生长因子受体 1(FGFR1))的潜在抑制剂得到了验证。总体而言,KPGT 可以为推进人工智能(AI)辅助药物发现过程提供一个强大而有用的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df89/10663446/3d732899eae6/41467_2023_43214_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验