Suppr超能文献

无偏的与生物体无关且高度敏感的信号肽预测器,具有深度蛋白质语言模型。

Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model.

机构信息

Department of Computer Science and Engineering, CUHK, Hong Kong SAR, China.

Department of Computer Science and Engineering, Washington University, St. Louis, MO, US.

出版信息

Nat Comput Sci. 2024 Jan;4(1):29-42. doi: 10.1038/s43588-023-00576-2. Epub 2023 Dec 13.

Abstract

Signal peptides (SPs) are essential to target and transfer transmembrane and secreted proteins to the correct positions. Many existing computational tools for predicting SPs disregard the extreme data imbalance problem and rely on additional group information of proteins. Here we introduce Unbiased Organism-agnostic Signal Peptide Network (USPNet), an SP classification and cleavage-site prediction deep learning method. Extensive experimental results show that USPNet substantially outperforms previous methods on classification performance by 10%. An SP-discovering pipeline with USPNet is designed to explore unprecedented SPs from metagenomic data. It reveals 347 SP candidates, with the lowest sequence identity between our candidates and the closest SP in the training dataset at only 13%. In addition, the template modeling scores between candidates and SPs in the training set are mostly above 0.8. The results showcase that USPNet has learnt the SP structure with raw amino acid sequences and the large protein language model, thereby enabling the discovery of unknown SPs.

摘要

信号肽 (SPs) 对于将跨膜和分泌蛋白靶向并转移到正确位置至关重要。许多现有的 SP 预测计算工具都忽略了极端的数据不平衡问题,并依赖于蛋白质的其他组信息。在这里,我们介绍了无偏生物通用信号肽网络 (USPNet),这是一种 SP 分类和切割位点预测深度学习方法。广泛的实验结果表明,USPNet 在分类性能方面比以前的方法提高了 10%。设计了一个带有 USPNet 的 SP 发现管道,用于从宏基因组数据中探索前所未有的 SP。它揭示了 347 个 SP 候选者,与训练数据集中最接近的 SP 的最低序列同一性仅为 13%。此外,候选者与训练集中 SP 之间的模板建模得分大多在 0.8 以上。结果表明,USPNet 已经使用原始氨基酸序列和大型蛋白质语言模型学习了 SP 结构,从而能够发现未知的 SP。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验