Osborne John D, Flatow Jared, Holko Michelle, Lin Simon M, Kibbe Warren A, Zhu Lihua Julie, Danila Maria I, Feng Gang, Chisholm Rex L
Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-10-S1-S6.
The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.
We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.
The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.
人类基因组已通过基因本体论对生物学功能进行了广泛注释,但对疾病的计算注释极少。
我们使用统一医学语言系统(UMLS)元数据映射转换工具(MMTx)从基因相关功能信息(GeneRIF)数据库中发现基因与疾病的关系。我们利用了UMLS的一个全面子集,该子集以疾病为重点并构建为有向无环图(疾病本体论),以筛选和解释MMTx的结果。使用召回率和精确率测量方法,根据霍马尤尼基因集对结果进行了验证。我们将我们的结果与广泛使用的《人类孟德尔遗传在线》(OMIM)注释进行了比较。
验证数据集表明,使用GeneRIF进行疾病注释的召回率为91%,精确率为97%,而使用OMIM的召回率为22%,精确率为98%。我们基于词库的方法允许在包含疾病的数据库之间进行比较,并通过同义词匹配提高疾病识别的准确性。我们方法的召回率高得多,这表明用疾病本体论和GeneRIF对人类基因组进行疾病注释大大增加了人类基因组疾病注释的覆盖范围。