Sánchez-Cruz Norberto, Pilón-Jiménez B Angélica, Medina-Franco José L
Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City, Mexico City, 04510, Mexico.
F1000Res. 2019 Dec 10;8. doi: 10.12688/f1000research.21540.2. eCollection 2019.
Natural product databases are important in drug discovery and other research areas. An analysis of its structural content, as well as functional group occurrence, provides a useful overview, as well as a means of comparison with related databases. BIOFACQUIM is an emerging database of natural products characterized and isolated in Mexico. Herein, we discuss the results of a first systematic functional group analysis and global diversity of an updated version of BIOFACQUIM. BIOFACQUIM was augmented through a literature search and data curation. A structural content analysis of the dataset was performed. This involved a functional group analysis with a novel algorithm to automatically identify all functional groups in a molecule and an assessment of the global diversity using consensus diversity plots. To this end, BIOFACQUIM was compared to two major and large databases: ChEMBL 25, and a herein assembled collection of natural products with 169,839 unique compounds. The structural content analysis showed that 15.7% of compounds and 11.6% of scaffolds present in the current version of BIOFACQUIM have not been reported in the other large reference datasets. It also gave a diversity increase in terms of scaffolds and molecular fingerprints regarding the previous version of the dataset, as well as a higher similarity to the assembled collection of natural products than to ChEMBL 25, in terms of diversity and frequent functional groups. A total of 148 natural products were added to BIOFACQUIM, which meant a diversity increase in terms of scaffolds and fingerprints. Regardless of its relatively small size, there are a significant number of compounds and scaffolds that are not present in the reference datasets, showing that curated databases of natural products, such as BIOFACQUIM, can serve as a starting point to increase the biologically relevant chemical space.
天然产物数据库在药物发现和其他研究领域中至关重要。对其结构内容以及官能团出现情况的分析,能提供有用的概述,也是与相关数据库进行比较的一种方式。BIOFACQUIM是一个在墨西哥表征和分离出的天然产物新兴数据库。在此,我们讨论了BIOFACQUIM更新版本的首次系统官能团分析结果和全局多样性。BIOFACQUIM通过文献检索和数据整理得到扩充。对数据集进行了结构内容分析。这涉及使用一种新颖算法进行官能团分析,以自动识别分子中的所有官能团,并使用共识多样性图评估全局多样性。为此,将BIOFACQUIM与两个大型主要数据库进行了比较:ChEMBL 25,以及在此汇编的包含169,839种独特化合物的天然产物集合。结构内容分析表明,BIOFACQUIM当前版本中15.7%的化合物和11.6%的支架在其他大型参考数据集中未被报道。与数据集的先前版本相比,它在支架和分子指纹方面的多样性也有所增加,并且在多样性和常见官能团方面,与汇编的天然产物集合的相似性高于与ChEMBL 25的相似性。共有148种天然产物被添加到BIOFACQUIM中,这意味着在支架和指纹方面的多样性增加。尽管其规模相对较小,但有大量化合物和支架不存在于参考数据集中,这表明像BIOFACQUIM这样经过整理的天然产物数据库可以作为增加生物学相关化学空间的起点。