Rubinstein Ran, Simon Itamar
Department of Molecular Biology, Hebrew University-Hadassah Medical School, Jerusalem 91120, Israel.
BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12.
High-throughput genomic research tools are becoming standard in the biologist's toolbox. After processing the genomic data with one of the many available statistical algorithms to identify statistically significant genes, these genes need to be further analyzed for biological significance in light of all the existing knowledge. Literature mining--the process of representing literature data in a fashion that is easy to relate to genomic data--is one solution to this problem.
We present a web-based tool, MILANO (Microarray Literature-based Annotation), that allows annotation of lists of genes derived from microarray results by user defined terms. Our annotation strategy is based on counting the number of literature co-occurrences of each gene on the list with a user defined term. This strategy allows the customization of the annotation procedure and thus overcomes one of the major limitations of the functional annotations usually provided with microarray results. MILANO expands the gene names to include all their informative synonyms while filtering out gene symbols that are likely to be less informative as literature searching terms. MILANO supports searching two literature databases: GeneRIF and Medline (through PubMed), allowing retrieval of both quick and comprehensive results. We demonstrate MILANO's ability to improve microarray analysis by analyzing a list of 150 genes that were affected by p53 overproduction. This analysis reveals that MILANO enables immediate identification of known p53 target genes on this list and assists in sorting the list into genes known to be involved in p53 related pathways, apoptosis and cell cycle arrest.
MILANO provides a useful tool for the automatic custom annotation of microarray results which is based on all the available literature. MILANO has two major advances over similar tools: the ability to expand gene names to include all their informative synonyms while removing synonyms that are not informative and access to the GeneRIF database which provides short summaries of curated articles relevant to known genes. MILANO is available at http://milano.md.huji.ac.il.
高通量基因组研究工具正成为生物学家工具库中的标准工具。在用众多可用的统计算法之一处理基因组数据以识别具有统计学意义的基因后,需要根据所有现有知识对这些基因进行进一步的生物学意义分析。文献挖掘——以一种易于与基因组数据相关联的方式表示文献数据的过程——是解决这个问题的一种方法。
我们展示了一个基于网络的工具MILANO(基于微阵列文献的注释),它允许通过用户定义的术语对从微阵列结果中得出的基因列表进行注释。我们的注释策略基于计算列表上每个基因与用户定义术语的文献共现次数。这种策略允许定制注释过程,从而克服了通常随微阵列结果提供的功能注释的一个主要限制。MILANO扩展基因名称以包括其所有信息丰富的同义词,同时过滤掉作为文献搜索词可能信息较少的基因符号。MILANO支持搜索两个文献数据库:GeneRIF和Medline(通过PubMed),允许检索快速且全面的结果。我们通过分析受p53过量表达影响的150个基因列表来证明MILANO改进微阵列分析的能力。该分析表明,MILANO能够立即识别该列表上已知的p53靶基因,并有助于将该列表分类为已知参与p53相关途径、细胞凋亡和细胞周期停滞的基因。
MILANO提供了一个基于所有可用文献对微阵列结果进行自动定制注释的有用工具。与类似工具相比,MILANO有两个主要进步:能够扩展基因名称以包括其所有信息丰富的同义词,同时去除无信息的同义词,以及能够访问GeneRIF数据库,该数据库提供与已知基因相关的精选文章的简短摘要。可在http://milano.md.huji.ac.il上获取MILANO。