Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.
Institute of Biomedical Technologies National Research Council, Milan, Segrate, Italy.
BMC Bioinformatics. 2018 Feb 6;19(1):36. doi: 10.1186/s12859-018-2049-x.
Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential "punctuation marks" for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes.
In this study we investigated the possibility to identify putative promoters in prokaryotes based on evolutionarily conserved motifs, and focused our attention on GC-rich bacteria in which promoter prediction with conventional, consensus-based algorithms is often not-exhaustive. Here, we introduce G4PromFinder, a novel algorithm that predicts putative promoters based on AT-rich elements and G-quadruplex DNA motifs. We tested its performances by using available genomic and transcriptomic data of the model microorganisms Streptomyces coelicolor A3(2) and Pseudomonas aeruginosa PA14. We compared our results with those obtained by three currently available promoter predicting algorithms: the σconsensus-based PePPER, the σ factors consensus-based bTSSfinder, and PromPredict which is based on double-helix DNA stability. Our results demonstrated that G4PromFinder is more suitable than the three reference tools for both the genomes. In fact our algorithm achieved the higher accuracy (F-scores 0.61 and 0.53 in the two genomes) as compared to the next best tool that is PromPredict (F-scores 0.46 and 0.48). Consensus-based algorithms produced lower performances with the analyzed GC-rich genomes.
Our analysis shows that G4PromFinder is a powerful tool for promoter search in GC-rich bacteria, especially for bacteria coding for a lot of σ factors, such as the model microorganism S. coelicolor A3(2). Moreover consensus-based tools and, in general, tools that are based on specific features of bacterial σ factors seem to be less performing for promoter prediction in these types of bacterial genomes.
在过去的几十年中,计算基因组学极大地促进了从基因组序列和相关数据中破译生物学。人们已经投入了相当大的努力来预测转录启动子和终止子位点,这些位点是 DNA 转录的基本“标点符号”。原核生物启动子的计算预测是一个问题,在计算基因组学中,这个问题的解决方案还远未确定。大多数已发表的细菌启动子预测工具都是基于保守序列搜索的,它们是专门为营养σ启动子设计的,因此不适合预测含有大量σ因子的细菌的启动子,如放线菌。
在这项研究中,我们研究了基于进化保守基序识别原核生物中潜在启动子的可能性,并将注意力集中在 GC 丰富的细菌上,因为传统的基于共识的算法在这些细菌中进行启动子预测往往不彻底。在这里,我们引入了 G4PromFinder,这是一种基于富含 AT 的元件和 G-四联体 DNA 基序预测潜在启动子的新算法。我们使用模型微生物链霉菌 A3(2)和铜绿假单胞菌 PA14 的可用基因组和转录组数据来测试其性能。我们将结果与三种现有的启动子预测算法的结果进行了比较:基于σ 因子共识的 PePPER、基于σ 因子共识的 bTSSfinder 和基于双螺旋 DNA 稳定性的 PromPredict。结果表明,G4PromFinder 比三种参考工具更适合这两种基因组。事实上,我们的算法在两个基因组中的准确性(F 分数为 0.61 和 0.53)都高于下一个最佳工具 PromPredict(F 分数为 0.46 和 0.48)。基于共识的算法在分析的 GC 丰富基因组中的性能较低。
我们的分析表明,G4PromFinder 是一种在 GC 丰富的细菌中寻找启动子的强大工具,特别是对于编码大量σ 因子的细菌,如模型微生物链霉菌 A3(2)。此外,基于共识的工具,以及一般来说,基于细菌σ 因子特定特征的工具,在这些类型的细菌基因组中进行启动子预测时似乎表现不佳。