Suppr超能文献

GENCODE 假基因资源。

The GENCODE pseudogene resource.

机构信息

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

出版信息

Genome Biol. 2012 Sep 26;13(9):R51. doi: 10.1186/gb-2012-13-9-r51.

Abstract

BACKGROUND

Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data.

RESULTS

As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection.

CONCLUSIONS

At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.

摘要

背景

假基因长期以来一直被认为是无功能的基因组序列。然而,最近的证据表明,它们中的许多可能具有某种形式的生物活性,并且它们的功能可能性增加了人们对其准确注释和与功能基因组数据集成的兴趣。

结果

作为 GENCODE 对人类基因组注释的一部分,我们基于大规模手动注释和计算机管道,首次对蛋白质编码基因进行了全基因组假基因分配。这种耦合方法的一个关键方面是,它允许我们以无偏的方式识别假基因,并通过手动评估来理清复杂事件。我们将假基因注释与广泛的 ENCODE 功能基因组学信息集成在一起。特别是,我们确定了每个假基因的表达水平、转录因子和 RNA 聚合酶 II 结合以及与每个假基因相关的染色质标记。根据它们的分布,我们为每种类型的活动开发了简单的统计模型,并通过大规模 RT-PCR-Seq 实验进行了验证。最后,我们将我们的假基因与灵长类动物比对和 1000 基因组计划的保守性和变异数据进行比较,生成潜在受选择的假基因列表。

结论

在一个极端,一些假基因具有功能的常规特征;这些可能代表最近死亡的基因。另一方面,我们发现了有趣的部分活性模式,这可能表明死亡基因正在作为功能非编码 RNA 复活。每个假基因的活动数据都存储在一个相关的资源 psiDR 中,这对于潜在功能假基因的初步识别将非常有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77fb/3491395/819531016046/gb-2012-13-9-r51-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验