Suppr超能文献

PDBx/mmCIF 生态系统:结构生物学的基础语义工具。

PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology.

机构信息

Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA.

Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.

出版信息

J Mol Biol. 2022 Jun 15;434(11):167599. doi: 10.1016/j.jmb.2022.167599. Epub 2022 Apr 20.

Abstract

PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide.

摘要

PDBx/mmCIF,蛋白质数据库交换(PDBx)大分子晶体学信息框架(mmCIF),已成为结构生物学的数据标准。它起源于小分子晶体学领域,提供了一种可扩展的数据表示,用于通过全球蛋白质数据库(wwPDB,wwpdb.org) depositions、存档、修复和公开发布通过实验确定的生物大分子的三维(3D)结构。PDBx/mmCIF 的扩展也用于 ModelArchive(modelarchive.org)的计算结构模型、PDB-Dev(pdb-dev.wwpdb.org)的整合/混合结构、Small Angle Scattering Biological Data Bank SASBDB(sasbdb.org)的小角度散射数据,以及使用 AlphaFold 2.0 深度学习软件套件(alphafold.ebi.ac.uk)计算生成的模型。PDBx/mmCIF 的社区驱动开发跨越了三十年,涉及到结构科学研究人员、软件和方法开发者、数据存储库提供商、科学出版商和专业协会的贡献。拥有一个语义丰富和可扩展的数据框架来表示广泛的结构生物学实验和计算结果,结合公共存储库中精心管理的 3D 生物结构数据集,加速了科学发现的步伐。在此,我们描述了 PDBx/mmCIF 数据标准的架构、用于维护数据标准表示的工具、治理以及扩展数据内容标准的流程,以及用于处理和检查 PDBx/mmCIF 数据完整性的社区工具/软件库。用例举例说明了全球蛋白质数据库的成员如何将 PDBx/mmCIF 用作其交付可发现、可访问、可互操作和可重用(FAIR)数据的管道的基础,为数百万全球用户提供服务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7049/10292674/f4f0f2f5f438/nihms-1907597-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验