从头计算预测原核基因组中的非编码 RNA 基因。

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.

机构信息

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.

出版信息

Bioinformatics. 2009 Nov 15;25(22):2897-905. doi: 10.1093/bioinformatics/btp537. Epub 2009 Sep 10.

DOI:10.1093/bioinformatics/btp537

PMID:19744996

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2773258/

Abstract

MOTIVATION

The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues.

RESULTS

We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation.

AVAILABILITY

The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/.

摘要

动机

非编码 RNA（ncRNA）基因的计算识别是计算生物学中最重要和最具挑战性的问题之一。现有的 ncRNA 基因预测方法主要依赖于同源信息，因此限制了它们在具有已知同源物的 ncRNA 基因中的应用。

结果

我们提出了一种新的基于从头预测算法的 ncRNA 基因，使用从已知 ncRNA 基因的序列和结构中提取的特征与诱饵进行比较。使用这些特征，我们训练了一个基于神经网络的分类器，并将其应用于大肠杆菌和硫矿硫化叶菌的全基因组预测 ncRNA。我们的方法在识别大肠杆菌中潜在 ncRNA 基因的窗口时，平均预测灵敏度和特异性分别为 68%和 70%。通过组合不同大小的窗口并使用位置过滤策略，我们预测了 601 个候选 ncRNA，并在大肠杆菌中恢复了 41%的已知 ncRNA。我们通过 Northern blot 分析对六个新的候选物进行了实验研究，发现了三个候选物的表达：一个代表潜在的新 ncRNA，一个与稳定的 mRNA 降解中间体相关，另一个是参与细胞分裂调节的潜在核糖开关或转录衰减子的情况。总的来说，我们的方法能够在部分或完全测序的微生物基因组中识别顺式和反式作用的 ncRNA，而不需要同源性或结构保守性。

可用性

源代码和结果可在 http://csbl.bmb.uga.edu/publications/materials/tran/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3abb/2773258/d0e135b90a6e/btp537f1.jpg

相似文献

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.

Bioinformatics. 2009 Nov 15;25(22):2897-905. doi: 10.1093/bioinformatics/btp537. Epub 2009 Sep 10.

Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):274-85. doi: 10.1109/TCBB.2012.137.

Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures.

PLoS One. 2015 Jun 15;10(6):e0130200. doi: 10.1371/journal.pone.0130200. eCollection 2015.

A comparative genome-wide study of ncRNAs in trypanosomatids.

BMC Genomics. 2010 Nov 4;11:615. doi: 10.1186/1471-2164-11-615.

Biocomputational prediction of non-coding RNAs in model cyanobacteria.

BMC Genomics. 2009 Mar 23;10:123. doi: 10.1186/1471-2164-10-123.

Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming.

Nucleic Acids Res. 2005 Jun 7;33(10):3263-70. doi: 10.1093/nar/gki644. Print 2005.

Non-coding RNA detection methods combined to improve usability, reproducibility and precision.

BMC Bioinformatics. 2010 Sep 29;11:491. doi: 10.1186/1471-2105-11-491.

Detecting uber-operons in prokaryotic genomes.

Nucleic Acids Res. 2006 May 8;34(8):2418-27. doi: 10.1093/nar/gkl294. Print 2006.

Computational RNomics: structure identification and functional prediction of non-coding RNAs in silico.

Sci China Life Sci. 2010 May;53(5):548-62. doi: 10.1007/s11427-010-0101-9. Epub 2010 May 23.

De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring.

Genome Res. 2008 Jun;18(6):888-99. doi: 10.1101/gr.069104.107. Epub 2008 Mar 17.

引用本文的文献

Prevalence of small base-pairing RNAs derived from diverse genomic loci.

Biochim Biophys Acta Gene Regul Mech. 2020 Jul;1863(7):194524. doi: 10.1016/j.bbagrm.2020.194524. Epub 2020 Mar 5.

Pervasive Regulatory Functions of mRNA Structure Revealed by High-Resolution SHAPE Probing.

Cell. 2018 Mar 22;173(1):181-195.e18. doi: 10.1016/j.cell.2018.02.034. Epub 2018 Mar 15.

Transcriptional Variation of Diverse Enteropathogenic Isolates under Virulence-Inducing Conditions.

mSystems. 2017 Jul 25;2(4). doi: 10.1128/mSystems.00024-17. eCollection 2017 Jul-Aug.

A Review on Recent Computational Methods for Predicting Noncoding RNAs.

Biomed Res Int. 2017;2017:9139504. doi: 10.1155/2017/9139504. Epub 2017 May 3.

An improved method for identification of small non-coding RNAs in bacteria using support vector machine.

Sci Rep. 2017 Apr 6;7:46070. doi: 10.1038/srep46070.

A Review of Computational Methods for Finding Non-Coding RNA Genes.

Genes (Basel). 2016 Dec 3;7(12):113. doi: 10.3390/genes7120113.

LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.

PLoS One. 2016 May 26;11(5):e0154567. doi: 10.1371/journal.pone.0154567. eCollection 2016.

Computational Detection of piRNA in Human Using Support Vector Machine.

Avicenna J Med Biotechnol. 2016 Jan-Mar;8(1):36-41.

Differential expression of small RNAs under chemical stress and fed-batch fermentation in E. coli.

BMC Genomics. 2015 Dec 10;16:1051. doi: 10.1186/s12864-015-2231-8.

Secondary structural entropy in RNA switch (Riboswitch) identification.

BMC Bioinformatics. 2015 Apr 28;16:133. doi: 10.1186/s12859-015-0523-2.

本文引用的文献

The Vienna RNA websuite.

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W70-4. doi: 10.1093/nar/gkn188. Epub 2008 Apr 19.

De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring.

Genome Res. 2008 Jun;18(6):888-99. doi: 10.1101/gr.069104.107. Epub 2008 Mar 17.

RNACluster: An integrated tool for RNA secondary structure comparison and clustering.

J Comput Chem. 2008 Jul 15;29(9):1517-26. doi: 10.1002/jcc.20911.

Rho-independent transcription terminators inhibit RNase P processing of the secG leuU and metT tRNA polycistronic transcripts in Escherichia coli.

Nucleic Acids Res. 2008 Feb;36(2):364-75. doi: 10.1093/nar/gkm991. Epub 2007 Nov 22.

Boltzmann ensemble features of RNA secondary structures: a comparative analysis of biological RNA sequences and random shuffles.

J Math Biol. 2008 Jan;56(1-2):93-105. doi: 10.1007/s00285-007-0129-z. Epub 2007 Oct 2.

Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.

Genome Biol. 2007;8(2):R22. doi: 10.1186/gb-2007-8-2-r22.

Operon prediction in Pyrococcus furiosus.

Nucleic Acids Res. 2007;35(1):11-20. doi: 10.1093/nar/gkl974. Epub 2006 Dec 5.

Transcriptional analysis of the Escherichia coli mreBCD genes responsible for morphogenesis and chromosome segregation.

Biosci Biotechnol Biochem. 2006 Nov;70(11):2712-9. doi: 10.1271/bbb.60315. Epub 2006 Nov 7.

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.

Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.

Target prediction for small, noncoding RNAs in bacteria.

Nucleic Acids Res. 2006 May 22;34(9):2791-802. doi: 10.1093/nar/gkl356. Print 2006.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从头计算预测原核基因组中的非编码 RNA 基因。

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献