Suppr超能文献

BiG-SLiCE:一个高度可扩展的工具,可绘制 120 万个生物合成基因簇的多样性图谱。

BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters.

机构信息

Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.

Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, sThe Netherlands.

出版信息

Gigascience. 2021 Jan 13;10(1). doi: 10.1093/gigascience/giaa154.

Abstract

BACKGROUND

Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs).

RESULTS

Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.

CONCLUSIONS

BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

摘要

背景

基因组挖掘生物合成基因簇(BGCs)已成为天然产物发现不可或缺的一部分。目前公开的超过 20 万种微生物基因组包含了丰富的新型化学物质信息。一种方法是通过同源 BGC 的比较分析来探索这种巨大的基因组多样性,这可以识别跨物种的模式,从而可以将这些模式与代谢物或生物活性的存在相匹配。然而,目前的工具受到了一个瓶颈的限制,这个瓶颈是由用于将这些 BGC 分组到基因簇家族(GCFs)的昂贵的基于网络的方法造成的。

结果

在这里,我们引入了 BiG-SLiCE,这是一种用于聚类大量 BGC 的工具。通过在欧几里得空间中表示它们,BiG-SLiCE 可以以非成对、近线性的方式将 BGC 分组到 GCF 中。我们使用 BiG-SLiCE 分析了从 209206 个公开的微生物基因组和宏基因组组装基因组中收集的 1225071 个 BGC,在典型的 36 核 CPU 服务器上,在 10 天内完成。我们通过重建跨分类单元的次级代谢多样性的全球图谱来展示这些分析的实用性,以识别未被发现的生物合成潜力。BiG-SLiCE 还提供了一种“查询模式”,可以有效地将新测序的 BGC 放入之前计算的 GCF 中,以及一个强大的输出可视化引擎,方便用户友好的数据探索。

结论

BiG-SLiCE 为加速天然产物的发现开辟了新的可能性,并为构建一个全球性的、可搜索的 BGC 互联网络提供了一个起点。随着越来越多的未被充分研究的分类单元的基因组被测序,可以挖掘更多的信息来突出它们潜在的新型化学物质。BiG-SLiCE 可通过 https://github.com/medema-group/bigslice 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce6/7804863/62332abf6059/giaa154fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验