Suppr超能文献

BinaryCIF 和 CIFTools——轻量级、高效且可扩展的大分子数据管理。

BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management.

机构信息

CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.

National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic.

出版信息

PLoS Comput Biol. 2020 Oct 19;16(10):e1008247. doi: 10.1371/journal.pcbi.1008247. eCollection 2020 Oct.

Abstract

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.

摘要

3D 大分子结构数据在实验和计算结构测定方法(包括大分子晶体学、低温电子显微镜和综合方法)取得实质性进展的推动下,变得越来越复杂和丰富。高效处理 3D 大分子结构数据的方法对于归档、分析和可视化至关重要,这是符合 FAIR 原则的互操作性和可重用性的关键。我们解决了数据大小和复杂性增长带来的两个挑战。首先,通过定制压缩技术来减小数据大小。其次,通过改进软件工具和充分利用可用的数据字典模式来管理复杂性。为此,我们引入了 BinaryCIF,这是一种对 Crystallographic Information File(CIF)格式文件进行序列化的方法,它与相关数据模式(如 PDBx/mmCIF)保持完全兼容,同时将文件大小减小了两倍以上,而与 gzip 压缩的 CIF 文件相比。此外,对于最大的结构,BinaryCIF 分别提供了更好的压缩因子 10 和 4,与 CIF 文件和 gzipped CIF 文件相比。在此,我们描述了 CIFTools,这是一组用 Java 和 TypeScript 编写的库,用于通用和类型化处理 CIF 和 BinaryCIF 文件。BinaryCIF 和 CIFTools 一起实现了对 3D 大分子结构数据的轻量级、高效和可扩展处理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0281/7595629/73777968e8bc/pcbi.1008247.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验