高精准度的人类蛋白质组蛋白结构预测。

Highly accurate protein structure prediction for the human proteome.

机构信息

DeepMind, London, UK.

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.

出版信息

Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.

DOI:10.1038/s41586-021-03828-1

PMID:34293799

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8387240/

Abstract

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

摘要

蛋白质结构可以提供非常有价值的信息，既可以用于推理生物过程，也可以用于干预，如基于结构的药物开发或靶向诱变。经过几十年的努力，人类蛋白质序列中总残基的 17% 被实验确定的结构所覆盖。在这里，我们通过应用最先进的机器学习方法 AlphaFold，以几乎涵盖整个人类蛋白质组（98.5%的人类蛋白质）的规模，显著扩大了蛋白质组的结构覆盖范围。由此产生的数据集涵盖了 58%有可靠预测的残基，其中一部分（所有残基的 36%）具有非常高的置信度。我们引入了一些通过构建 AlphaFold 模型开发的指标，并使用它们来解释数据集，识别出强的多结构域预测以及可能无序的区域。最后，我们提供了一些案例研究来说明如何使用高质量的预测来生成生物学假设。我们正在将我们的预测免费提供给社区，并预计常规的大规模和高精度结构预测将成为一个重要的工具，它将允许从结构角度提出新的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbb/8387240/db3a50fcd91f/41586_2021_3828_Fig1_HTML.jpg

相似文献

Highly accurate protein structure prediction for the human proteome.

Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.

AlphaFold and Implications for Intrinsically Disordered Proteins.

J Mol Biol. 2021 Oct 1;433(20):167208. doi: 10.1016/j.jmb.2021.167208. Epub 2021 Aug 18.

Highly accurate protein structure prediction with AlphaFold.

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

The AlphaFold Database of Protein Structures: A Biologist's Guide.

J Mol Biol. 2022 Jan 30;434(2):167336. doi: 10.1016/j.jmb.2021.167336. Epub 2021 Oct 29.

Folding the human proteome using BioNeMo: A fused dataset of structural models for machine learning purposes.

Sci Data. 2024 Jun 6;11(1):591. doi: 10.1038/s41597-024-03403-z.

Applying and improving AlphaFold at CASP14.

Proteins. 2021 Dec;89(12):1711-1721. doi: 10.1002/prot.26257.

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints.

Nat Commun. 2019 Sep 4;10(1):3977. doi: 10.1038/s41467-019-11994-0.

The structural coverage of the human proteome before and after AlphaFold.

PLoS Comput Biol. 2022 Jan 24;18(1):e1009818. doi: 10.1371/journal.pcbi.1009818. eCollection 2022 Jan.

Improved protein structure prediction using potentials from deep learning.

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Deep learning methods for 3D structural proteome and interactome modeling.

Curr Opin Struct Biol. 2022 Apr;73:102329. doi: 10.1016/j.sbi.2022.102329. Epub 2022 Feb 6.

引用本文的文献

Phase separation of ERCC6L2-CtIP regulates the extent of DNA end resection.

Nat Cell Biol. 2025 Sep 5. doi: 10.1038/s41556-025-01760-4.

Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.

Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.

Cell reprogramming in cancer: Interplay of genetic, epigenetic mechanisms, and the tumor microenvironment in carcinogenesis and metastasis.

World J Clin Oncol. 2025 Aug 24;16(8):106838. doi: 10.5306/wjco.v16.i8.106838.

Structural insights into proprotein convertase activation facilitate the engineering of highly specific furin inhibitors.

Nat Commun. 2025 Sep 2;16(1):8206. doi: 10.1038/s41467-025-63479-y.

Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine.

Nat Rev Cardiol. 2025 Sep 1. doi: 10.1038/s41569-025-01201-7.

Selective disruption of DNMT1/ELK1 interactions induces DGKI re-expression and promotes temozolomide sensitivity of MGMT/DGKI glioblastoma.

Clin Epigenetics. 2025 Aug 30;17(1):146. doi: 10.1186/s13148-025-01943-8.

Intra-Host Evolution During Relapsing Parvovirus B19 Infection in Immunocompromised Patients.

Viruses. 2025 Jul 23;17(8):1034. doi: 10.3390/v17081034.

A novel intracellular signaling pathway elicited by DM9CP-6 regulates immune responses in oysters.

Cell Commun Signal. 2025 Aug 26;23(1):383. doi: 10.1186/s12964-025-02389-4.

DCLK1 isoform (DCLK1-S) as a critical player in promoting inflammation, tissue remodeling, and EMT in mouse models of colitis.

PLoS Pathog. 2025 Aug 21;21(8):e1013360. doi: 10.1371/journal.ppat.1013360. eCollection 2025 Aug.

Assessing variant effect predictors and disease mechanisms in intrinsically disordered proteins.

PLoS Comput Biol. 2025 Aug 19;21(8):e1013400. doi: 10.1371/journal.pcbi.1013400. eCollection 2025 Aug.

本文引用的文献

Highly accurate protein structure prediction with AlphaFold.

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

High-accuracy protein structure prediction in CASP14.

Proteins. 2021 Dec;89(12):1687-1699. doi: 10.1002/prot.26171. Epub 2021 Jul 14.

Structure-based protein function prediction using graph convolutional networks.

Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.

Critical assessment of protein intrinsic disorder prediction.

Nat Methods. 2021 May;18(5):472-481. doi: 10.1038/s41592-021-01117-3. Epub 2021 Apr 19.

Crystallographic molecular replacement using an in silico-generated search model of SARS-CoV-2 ORF8.

Protein Sci. 2021 Apr;30(4):728-734. doi: 10.1002/pro.4050. Epub 2021 Mar 4.

Structure and noncanonical Cdk8 activation mechanism within an Argonaute-containing Mediator kinase module.

Sci Adv. 2021 Jan 15;7(3). doi: 10.1126/sciadv.abd4484. Print 2021 Jan.

MobiDB-lite 3.0: fast consensus annotation of intrinsic disorder flavors in proteins.

Bioinformatics. 2021 Apr 1;36(22-23):5533-5534. doi: 10.1093/bioinformatics/btaa1045.

The Gene Ontology resource: enriching a GOld mine.

Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.

UniProt: the universal protein knowledgebase in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

Pfam: The protein families database in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高精准度的人类蛋白质组蛋白结构预测。

Highly accurate protein structure prediction for the human proteome.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献