Vesth Tammi C, Brandl Julian, Andersen Mikael Rørdam
Department of Systems Biology, Technical University of Denmark, Søltofts Plads 223, Denmark.
Synth Syst Biotechnol. 2016 Feb 23;1(2):122-129. doi: 10.1016/j.synbio.2016.01.002. eCollection 2016 Jun.
Secondary metabolites of fungi are receiving an increasing amount of interest due to their prolific bioactivities and the fact that fungal biosynthesis of secondary metabolites often occurs from co-regulated and co-located gene clusters. This makes the gene clusters attractive for synthetic biology and industrial biotechnology applications. We have previously published a method for accurate prediction of clusters from genome and transcriptome data, which could also suggest cross-chemistry, however, this method was limited both in the number of parameters which could be adjusted as well as in user-friendliness. Furthermore, sensitivity to the transcriptome data required manual curation of the predictions. In the present work, we have aimed at improving these features.
FunGeneClusterS is an improved implementation of our previous method with a graphical user interface for off- and on-line use. The new method adds options to adjust the size of the gene cluster(s) being sought as well as an option for the algorithm to be flexible with genes in the cluster which may not seem to be co-regulated with the remainder of the cluster. We have benchmarked the method using data from the well-studied and found that the method is an improvement over the previous one. In particular, it makes it possible to predict clusters with more than 10 genes more accurately, and allows identification of co-regulated gene clusters irrespective of the function of the genes. It also greatly reduces the need for manual curation of the prediction results. We furthermore applied the method to transcriptome data from . Using the identified best set of parameters, we were able to identify clusters for 31 out of 76 previously predicted secondary metabolite synthases/synthetases. Furthermore, we identified additional putative secondary metabolite gene clusters. In total, we predicted 432 co-transcribed gene clusters in (spanning 1.323 genes, 12% of the genome). Some of these had functions related to primary metabolism, e.g. we have identified a cluster for biosynthesis of biotin, as well as several for degradation of aromatic compounds. The data identifies that suggests that larger parts of the fungal genome than previously anticipated operates as gene clusters. This includes both primary and secondary metabolism as well as other cellular maintenance functions.
We have developed FunGeneClusterS in a graphical implementation and made the method capable of adjustments to different datasets and target clusters. The method is versatile in that it can predict co-regulated clusters not limited to secondary metabolism. Our analysis of data has shown not only the validity of the method, but also strongly suggests that large parts of fungal primary metabolism and cellular functions are both co-regulated and co-located.
真菌的次生代谢产物因其丰富的生物活性以及次生代谢产物的真菌生物合成通常发生在共同调控且位于同一位置的基因簇这一事实而受到越来越多的关注。这使得这些基因簇在合成生物学和工业生物技术应用中具有吸引力。我们之前发表了一种从基因组和转录组数据准确预测基因簇的方法,该方法还可以提示交叉化学性质,然而,该方法在可调整的参数数量以及用户友好性方面都存在局限性。此外,对转录组数据的敏感性要求对预测结果进行人工筛选。在本研究中,我们旨在改进这些特性。
FunGeneClusterS是我们之前方法的改进版本,具有用于离线和在线使用的图形用户界面。新方法增加了调整所寻找基因簇大小的选项,以及算法对基因簇中可能看似与簇的其余部分没有共同调控的基因保持灵活性的选项。我们使用来自深入研究的数据对该方法进行了基准测试,发现该方法比之前的方法有所改进。特别是,它能够更准确地预测具有10个以上基因的簇,并能够识别共同调控的基因簇,而不考虑基因的功能。它还大大减少了对预测结果进行人工筛选的需求。我们还将该方法应用于来自的数据的转录组数据。使用确定的最佳参数集,我们能够为76个先前预测的次生代谢物合成酶/合成酶中的31个识别出基因簇。此外,我们还识别出了其他假定的次生代谢物基因簇。总共,我们在(跨越1323个基因,占基因组的12%)中预测了432个共转录基因簇。其中一些与初级代谢相关,例如,我们识别出了一个生物素生物合成的基因簇,以及几个芳香化合物降解的基因簇。数据表明,真菌基因组中比以前预期的更大一部分作为基因簇发挥作用。这包括初级和次级代谢以及其他细胞维持功能。
我们以图形化实现方式开发了FunGeneClusterS,并使该方法能够针对不同的数据集和目标簇进行调整。该方法具有通用性,因为它可以预测不限于次级代谢的共同调控簇。我们对数据的分析不仅表明了该方法的有效性,而且还强烈表明真菌初级代谢和细胞功能的很大一部分都是共同调控且位于同一位置的。