Suppr超能文献

一种古老植物中的蛋白质编码基因:对小立碗藓中密码子使用、保留基因和剪接位点的分析

Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens.

作者信息

Rensing Stefan A, Fritzowsky Dana, Lang Daniel, Reski Ralf

机构信息

Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr, 1, 79104 Freiburg, Germany.

出版信息

BMC Genomics. 2005 Mar 22;6:43. doi: 10.1186/1471-2164-6-43.

Abstract

BACKGROUND

The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wide analysis of protein encoding genes.

RESULTS

We have clustered and assembled Physcomitrella patens EST and CDS data in order to represent the transcriptome of this non-seed plant. Clustering of the publicly available data and subsequent prediction resulted in a total of 19,081 non-redundant ORF. Of these putative transcripts, approximately 30% have a homolog in both rice and Arabidopsis transcriptome. More than 130 transcripts are not present in seed plants but can be found in other kingdoms. These potential "retained genes" might have been lost during seed plant evolution. Functional annotation of these genes reveals unequal distribution among taxonomic groups and intriguing putative functions such as cytotoxicity and nucleic acid repair. Whereas introns in the moss are larger on average than in the seed plant Arabidopsis thaliana, position and amount of introns are approximately the same. Contrary to Arabidopsis, where CDS contain on average 44% G/C, in Physcomitrella the average G/C content is 50%. Interestingly, moss orthologs of Arabidopsis genes show a significant drift of codon fraction usage, towards the seed plant. While averaged codon bias is the same in Physcomitrella and Arabidopsis, the distribution pattern is different, with 15% of moss genes being unbiased. Species-specific, sensitive and selective splice site prediction for Physcomitrella has been developed using a dataset of 368 donor and acceptor sites, utilizing a support vector machine. The prediction accuracy is better than those achieved with tools trained on Arabidopsis data.

CONCLUSION

Analysis of the moss transcriptome displays differences in gene structure, codon and splice site usage in comparison with the seed plant Arabidopsis. Putative retained genes exhibit possible functions that might explain the peculiar physiological properties of mosses. Both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction have been made available on http://www.cosmoss.org, setting the basis for assembly and annotation of the Physcomitrella genome, of which draft shotgun sequences will become available in 2005.

摘要

背景

由于其同源重组率高、单倍体特性、简单的身体结构、生理特性以及系统发育位置,小立碗藓成为一种新兴的植物模式系统。已有的EST数据被聚类和组装,为蛋白质编码基因的全基因组分析提供了基础。

结果

我们对小立碗藓的EST和CDS数据进行了聚类和组装,以代表这种非种子植物的转录组。对公开数据的聚类和后续预测共产生了19,081个非冗余开放阅读框。在这些推定的转录本中,约30%在水稻和拟南芥转录组中都有同源物。超过130个转录本在种子植物中不存在,但可在其他生物界中找到。这些潜在的“保留基因”可能在种子植物进化过程中丢失了。对这些基因的功能注释显示,它们在分类群中的分布不均,且具有如细胞毒性和核酸修复等有趣的推定功能。虽然小立碗藓中的内含子平均比种子植物拟南芥中的大,但内含子的位置和数量大致相同。与拟南芥中CDS平均含有44%的G/C不同,小立碗藓中的平均G/C含量为50%。有趣的是,拟南芥基因的小立碗藓直系同源物显示出密码子使用分数向种子植物的显著漂移。虽然小立碗藓和拟南芥的平均密码子偏好相同,但分布模式不同,15%的小立碗藓基因没有偏好。利用368个供体和受体位点的数据集,通过支持向量机开发了针对小立碗藓的物种特异性、敏感且选择性的剪接位点预测方法。预测准确率高于使用基于拟南芥数据训练的工具所达到的准确率。

结论

与种子植物拟南芥相比,对小立碗藓转录组的分析显示出基因结构、密码子和剪接位点使用上的差异。推定的保留基因展现出可能解释小立碗藓特殊生理特性的功能。转录组展示(包括BLAST和检索服务)以及剪接位点预测均可在http://www.cosmoss.org上获取,为小立碗藓基因组的组装和注释奠定了基础,其鸟枪法测序草图序列将于2005年公布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61f1/1079823/97e7e0f3cd71/1471-2164-6-43-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验