Suppr超能文献

TOBFAC:烟草转录因子数据库。

TOBFAC: the database of tobacco transcription factors.

作者信息

Rushton Paul J, Bokowiec Marta T, Laudeman Thomas W, Brannock Jennifer F, Chen Xianfeng, Timko Michael P

机构信息

Department of Biology, University of Virginia, Charlottesville, VA 22904, USA.

出版信息

BMC Bioinformatics. 2008 Jan 25;9:53. doi: 10.1186/1471-2105-9-53.

Abstract

BACKGROUND

Regulation of gene expression at the level of transcription is a major control point in many biological processes. Transcription factors (TFs) can activate and/or repress the transcriptional rate of target genes and vascular plant genomes devote approximately 7% of their coding capacity to TFs. Global analysis of TFs has only been performed for three complete higher plant genomes - Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa) and rice (Oryza sativa). Presently, no large-scale analysis of TFs has been made from a member of the Solanaceae, one of the most important families of vascular plants. To fill this void, we have analysed tobacco (Nicotiana tabacum) TFs using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering of the tobacco genome. An analytical pipeline was developed to isolate TF sequences from the GSR data set. This involved multiple (typically 10-15) independent searches with different versions of the TF family-defining domain(s) (normally the DNA-binding domain) followed by assembly into contigs and verification. Our analysis revealed that tobacco contains a minimum of 2,513 TFs representing all of the 64 well-characterised plant TF families. The number of TFs in tobacco is higher than previously reported for Arabidopsis and rice.

RESULTS

TOBFAC: the database of tobacco transcription factors, is an integrative database that provides a portal to sequence and phylogeny data for the identified TFs, together with a large quantity of other data concerning TFs in tobacco. The database contains an individual page dedicated to each of the 64 TF families. These contain background information, domain architecture via Pfam links, a list of all sequences and an assessment of the minimum number of TFs in this family in tobacco. Downloadable phylogenetic trees of the major families are provided along with detailed information on the bioinformatic pipeline that was used to find all family members. TOBFAC also contains EST data, a list of published tobacco TFs and a list of papers concerning tobacco TFs. The sequences and annotation data are stored in relational tables using a PostgrelSQL relational database management system. The data processing and analysis pipelines used the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The computationally intensive data processing and analysis pipelines were run on an Apple XServe cluster with more than 20 nodes.

CONCLUSION

TOBFAC is an expandable knowledgebase of tobacco TFs with data currently available for over 2,513 TFs from 64 gene families. TOBFAC integrates available sequence information, phylogenetic analysis, and EST data with published reports on tobacco TF function. The database provides a major resource for the study of gene expression in tobacco and the Solanaceae and helps to fill a current gap in studies of TF families across the plant kingdom. TOBFAC is publicly accessible at http://compsysbio.achs.virginia.edu/tobfac/.

摘要

背景

转录水平的基因表达调控是许多生物学过程中的主要控制点。转录因子(TFs)可激活和/或抑制靶基因的转录速率,维管植物基因组约7%的编码能力用于转录因子。目前仅对三个完整的高等植物基因组——拟南芥(Arabidopsis thaliana)、杨树(Populus trichocarpa)和水稻(Oryza sativa)进行了转录因子的全基因组分析。目前,尚未对茄科(维管植物最重要的科之一)的成员进行转录因子的大规模分析。为填补这一空白,我们利用通过烟草基因组甲基化过滤获得的1,159,022个基因空间序列读数(GSRs)数据集,对烟草(Nicotiana tabacum)转录因子进行了分析。开发了一种分析流程,从GSR数据集中分离转录因子序列。这涉及使用转录因子家族定义结构域(通常是DNA结合结构域)的不同版本进行多次(通常为10 - 15次)独立搜索,随后组装成重叠群并进行验证。我们的分析表明,烟草至少含有2513个转录因子,代表了所有64个特征明确的植物转录因子家族。烟草中转录因子的数量高于先前报道的拟南芥和水稻。

结果

TOBFAC:烟草转录因子数据库,是一个综合数据库,为已鉴定的转录因子提供了一个序列和系统发育数据的入口,以及大量关于烟草转录因子的其他数据。该数据库包含专门针对64个转录因子家族中每个家族的单独页面。这些页面包含背景信息、通过Pfam链接的结构域架构、所有序列列表以及对烟草中该家族转录因子最小数量的评估。提供了主要家族的可下载系统发育树以及用于找到所有家族成员的生物信息学流程的详细信息。TOBFAC还包含EST数据、已发表的烟草转录因子列表以及关于烟草转录因子的论文列表。序列和注释数据使用PostgrelSQL关系数据库管理系统存储在关系表中。数据处理和分析流程使用Perl编程语言。Web界面使用JavaScript和运行在Apache Web服务器上的Perl CGI实现。计算密集型的数据处理和分析流程在一个拥有20多个节点的Apple XServe集群上运行。

结论

TOBFAC是一个可扩展的烟草转录因子知识库,目前拥有来自64个基因家族的2513多个转录因子的数据。TOBFAC将可用的序列信息、系统发育分析和EST数据与关于烟草转录因子功能的已发表报告整合在一起。该数据库为烟草和茄科基因表达的研究提供了一个主要资源,并有助于填补目前植物界转录因子家族研究中的空白。可通过http://compsysbio.achs.virginia.edu/tobfac/公开访问TOBFAC。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cfa/2246155/234b221c15e2/1471-2105-9-53-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验