Suppr超能文献

巨嘴鸟:一种真菌生物合成基因簇发现框架

TOUCAN: a framework for fungal biosynthetic gene cluster discovery.

作者信息

Almeida Hayda, Palys Sylvester, Tsang Adrian, Diallo Abdoulaye Baniré

机构信息

Departement d'Informatique, UQAM, Montréal, QC, H2X 3Y7, Canada.

Centre for Structural and Functional Genomics, Concordia University, Montréal, QC, H4B 1R6, Canada.

出版信息

NAR Genom Bioinform. 2020 Nov 27;2(4):lqaa098. doi: 10.1093/nargab/lqaa098. eCollection 2020 Dec.

Abstract

Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 -measure on BGC regions in the genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.

摘要

真菌次级代谢产物(SMs)是众多生物活性化合物的重要来源,在制药行业有广泛应用,比如抗生素和抗癌药物的生产。新型真菌SMs的发现可能有益于人类健康。识别参与SMs生物合成的生物合成基因簇(BGCs)可能是一项成本高昂且复杂的任务,尤其是由于真菌BGCs的基因组多样性。先前关于真菌BGC发现的研究范围有限,可能会限制新BGCs的发现。在这项工作中,我们引入了TOUCAN,这是一个用于真菌BGC发现的监督学习框架。与先前的方法不同,TOUCAN能够基于氨基酸序列预测BGCs,便于在新测序且尚未整理的数据上使用。它依赖于三个主要支柱:由BGC专家严格选择数据集;结合功能、进化和组成特征以及表现出色的分类器;以及强大的后处理方法。TOUCAN在基因组中的BGC区域上的F1值为0.982。总体结果表明,TOUCAN优于先前的方法。TOUCAN专注于真菌BGCs,但可以轻松调整以扩大其范围,用于处理其他物种或纳入新特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ad6/7694738/b6bbe28f31d6/lqaa098fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验