Suppr超能文献

通过评估阿尔法折叠在预测蛋白质中缺失残基和结构无序方面的功效来处理非结构化问题。

Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins.

作者信息

Zheng Sen

机构信息

Bio-Electron Microscopy Facility, iHuman Institution, ShanghaiTech University, Shanghai, China.

出版信息

PLoS One. 2025 Mar 25;20(3):e0313812. doi: 10.1371/journal.pone.0313812. eCollection 2025.

Abstract

The study investigated regions with undefined structures, known as "missing" segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence and disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized as "modeled", "hard missing" and "soft missing" based on their visibility in structural datasets. Key features were determined, including a confidence score predicted local distance difference test (pLDDT) from AlphaFold2, an advanced structural prediction tool, and a disorder score from IUPred, a traditional disorder prediction method. To enhance prediction performance for unstructured residues, we employed a Long Short-Term Memory (LSTM) model, integrating both scores with amino acid sequences. Notable patterns such as composition, region lengths and prediction scores were observed in unstructured residues and regions identified through structural experiments over our studied period. Our findings also indicate that "hard missing" residues often align with low confidence scores, whereas "soft missing" residues exhibit dynamic behavior that can complicate predictions. The incorporation of pLDDT, IUPred scores, and sequence data into the LSTM model has improved the differentiation between structured and unstructured residues, particularly for shorter unstructured regions. This research elucidates the relationship between established computational predictions and experimental structural data, enhancing our ability to target structurally significant areas for research and guiding experimental designs toward functionally relevant regions.

摘要

该研究通过评估X射线晶体学和冷冻电子显微镜(Cryo-EM)数据中结构未定义的区域(即所谓的“缺失”片段)的预测结构置信度和无序分数,对这些区域展开了调查。利用蛋白质数据库(PDB)的综合数据集,根据残基在结构数据集中的可见性,将其分类为“已建模”、“硬缺失”和“软缺失”。确定了关键特征,包括来自先进结构预测工具AlphaFold2的预测局部距离差异测试(pLDDT)置信分数,以及来自传统无序预测方法IUPred的无序分数。为了提高对非结构化残基的预测性能,我们采用了长短期记忆(LSTM)模型,将这两个分数与氨基酸序列相结合。在我们研究期间,通过结构实验确定的非结构化残基和区域中观察到了诸如组成、区域长度和预测分数等显著模式。我们的研究结果还表明,“硬缺失”残基通常与低置信分数相关,而“软缺失”残基表现出动态行为,这可能会使预测变得复杂。将pLDDT、IUPred分数和序列数据纳入LSTM模型,改善了结构化和非结构化残基之间的区分,特别是对于较短的非结构化区域。这项研究阐明了既定计算预测与实验结构数据之间的关系,增强了我们针对结构上重要区域进行研究的能力,并指导实验设计朝着功能相关区域发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0117/11936262/ec5a1b6fc2fe/pone.0313812.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验