Suppr超能文献

从头计算预测原核基因组中的非编码 RNA 基因。

De novo computational prediction of non-coding RNA genes in prokaryotic genomes.

机构信息

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.

出版信息

Bioinformatics. 2009 Nov 15;25(22):2897-905. doi: 10.1093/bioinformatics/btp537. Epub 2009 Sep 10.

Abstract

MOTIVATION

The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues.

RESULTS

We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E.coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E.coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation.

AVAILABILITY

The source code and results are available at http://csbl.bmb.uga.edu/publications/materials/tran/.

摘要

动机

非编码 RNA(ncRNA)基因的计算识别是计算生物学中最重要和最具挑战性的问题之一。现有的 ncRNA 基因预测方法主要依赖于同源信息,因此限制了它们在具有已知同源物的 ncRNA 基因中的应用。

结果

我们提出了一种新的基于从头预测算法的 ncRNA 基因,使用从已知 ncRNA 基因的序列和结构中提取的特征与诱饵进行比较。使用这些特征,我们训练了一个基于神经网络的分类器,并将其应用于大肠杆菌和硫矿硫化叶菌的全基因组预测 ncRNA。我们的方法在识别大肠杆菌中潜在 ncRNA 基因的窗口时,平均预测灵敏度和特异性分别为 68%和 70%。通过组合不同大小的窗口并使用位置过滤策略,我们预测了 601 个候选 ncRNA,并在大肠杆菌中恢复了 41%的已知 ncRNA。我们通过 Northern blot 分析对六个新的候选物进行了实验研究,发现了三个候选物的表达:一个代表潜在的新 ncRNA,一个与稳定的 mRNA 降解中间体相关,另一个是参与细胞分裂调节的潜在核糖开关或转录衰减子的情况。总的来说,我们的方法能够在部分或完全测序的微生物基因组中识别顺式和反式作用的 ncRNA,而不需要同源性或结构保守性。

可用性

源代码和结果可在 http://csbl.bmb.uga.edu/publications/materials/tran/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3abb/2773258/d0e135b90a6e/btp537f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验