Suppr超能文献

dSPRINT:预测蛋白质结构域中 DNA、RNA、离子、肽和小分子相互作用的位点。

dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains.

机构信息

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.

Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA.

出版信息

Nucleic Acids Res. 2021 Jul 21;49(13):e78. doi: 10.1093/nar/gkab356.

Abstract

Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.

摘要

结构域在促进蛋白质与 DNA、RNA、小分子、离子和肽的相互作用方面发挥着重要作用。在序列中识别配体结合结构域是蛋白质功能注释的关键步骤,并且经常根据蛋白质是否包含这些结构域之一来分析其配体结合特性。然而,到目前为止,关于蛋白质结构域是否以及如何与配体相互作用的知识仅限于在共晶结构中观察到的结构域;这使得大约三分之二的人类蛋白质结构域家族在是否以及如何与 DNA、RNA、小分子、离子和肽结合方面仍然没有得到描述。为了填补这一空白,我们引入了 dSPRINT,这是一种用于预测结构域是否与 DNA、RNA、小分子、离子或肽结合以及参与这些类型相互作用的结构域内位置的新型集成机器学习方法。在严格的交叉验证测试中,我们证明 dSPRINT 在揭示配体结合位置和结构域方面具有出色的性能。我们还应用 dSPRINT 对未知功能结构域的分子功能进行新的特征描述。dSPRINT 的预测可以从结构域转移到序列,从而可以预测 95%的人类基因的配体结合特性。dSPRINT 框架及其对 6503 个人类蛋白质结构域的预测可在 http://protdomain.princeton.edu/dsprint 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533b/8287948/41d4425a8df8/gkab356fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验