Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, NJ 08903, USA.
Glycobiology. 2021 Sep 20;31(9):1204-1218. doi: 10.1093/glycob/cwab039.
Since 1971, the Protein Data Bank (PDB) has served as the single global archive for experimentally determined 3D structures of biological macromolecules made freely available to the global community according to the FAIR principles of Findability-Accessibility-Interoperability-Reusability. During the first 50 years of continuous PDB operations, standards for data representation have evolved to better represent rich and complex biological phenomena. Carbohydrate molecules present in more than 14,000 PDB structures have recently been reviewed and remediated to conform to a new standardized format. This machine-readable data representation for carbohydrates occurring in the PDB structures and the corresponding reference data improves the findability, accessibility, interoperability and reusability of structural information pertaining to these molecules. The PDB Exchange MacroMolecular Crystallographic Information File data dictionary now supports (i) standardized atom nomenclature that conforms to International Union of Pure and Applied Chemistry-International Union of Biochemistry and Molecular Biology (IUPAC-IUBMB) recommendations for carbohydrates, (ii) uniform representation of branched entities for oligosaccharides, (iii) commonly used linear descriptors of carbohydrates developed by the glycoscience community and (iv) annotation of glycosylation sites in proteins. For the first time, carbohydrates in PDB structures are consistently represented as collections of standardized monosaccharides, which precisely describe oligosaccharide structures and enable improved carbohydrate visualization, structure validation, robust quantitative and qualitative analyses, search for dendritic structures and classification. The uniform representation of carbohydrate molecules in the PDB described herein will facilitate broader usage of the resource by the glycoscience community and researchers studying glycoproteins.
自 1971 年以来,蛋白质数据银行(PDB)一直是生物大分子实验确定的三维结构的全球唯一存档库,根据可发现性-可访问性-互操作性-可重用性(FAIR)原则免费向全球社区提供。在 PDB 连续运行的头 50 年中,数据表示标准不断发展,以更好地表示丰富而复杂的生物现象。最近,对 PDB 结构中存在的超过 14000 个碳水化合物分子进行了审查和修复,以符合新的标准化格式。这种用于 PDB 结构中碳水化合物的可机读数据表示以及相应的参考数据,提高了与这些分子相关的结构信息的可发现性、可访问性、互操作性和可重用性。PDB Exchange 大分子晶体学信息文件数据字典现在支持 (i) 符合国际纯粹与应用化学联合会-国际生物化学与分子生物学联合会(IUPAC-IUBMB)碳水化合物建议的标准化原子命名法,(ii) 寡糖分支实体的统一表示,(iii) 糖科学社区开发的常用线性碳水化合物描述符,以及 (iv) 蛋白质中糖基化位点的注释。这是第一次,PDB 结构中的碳水化合物被一致地表示为标准化单糖的集合,这些单糖精确描述了寡糖结构,并能够改善碳水化合物可视化、结构验证、稳健的定量和定性分析、树突状结构搜索和分类。本文所述 PDB 中碳水化合物分子的统一表示将促进糖科学社区和研究糖蛋白的研究人员更广泛地使用该资源。