使用序列基序和k-mer对微小RNA进行分类。

MicroRNA categorization using sequence motifs and k-mers.

作者信息

Yousef Malik, Khalifa Waleed, Acar İlhan Erkin, Allmer Jens

机构信息

Community Information Systems, Zefat Academic College, Zefat, 13206, Israel.

Computer Science, The College of Sakhnin, Sakhnin, 30810, Israel.

出版信息

BMC Bioinformatics. 2017 Mar 14;18(1):170. doi: 10.1186/s12859-017-1584-1.

DOI:10.1186/s12859-017-1584-1

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5351198/

Abstract

BACKGROUND

Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species.

RESULTS

To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values.

CONCLUSIONS

We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

摘要

背景

转录后基因失调可能是癌症等疾病的一个标志，而微小RNA（miRNA）在翻译效率的调节中起关键作用。已知的前体miRNA列于miRBase中，并且已在从病毒、微生物到真核生物等多种生物体中被发现。前体miRNA的计算检测备受关注，此类方法通常采用机器学习来区分miRNA与其他序列。已经提出了许多描述前体miRNA的特征，我们之前已介绍过使用序列基序和k聚体作为有用的特征。有通过下一代测序检测到异种miRNA的报道。然而，它们可能是污染物，为辅助这一重要的决策过程，我们旨在建立一种区分不同物种前体miRNA的方法。

结果

为实现物种区分，我们将一个物种的前体miRNA作为正样本，另一个物种的前体miRNA作为负样本，用于基于序列基序和k聚体作为特征建立机器学习模型的训练和测试数据。这种方法在亲缘关系较远的物种之间产生了更高的准确率值，而亲缘关系较近的物种产生的准确率值较低。

结论

当进化距离增加时，我们能够越来越成功地区分不同物种。这一结论得到了先前关于miRNA快速进化变化报道的支持，因为即使在亲缘关系相对较近的物种之间也能够进行相当好的区分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8430/5351198/9232a959bb84/12859_2017_1584_Fig2_HTML.jpg

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。