Suppr超能文献

疾病 2.0:从文本挖掘和数据集成中获取的每周更新的疾病-基因关联数据库。

Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark.

Department of Internal Medicine, Division of Translational Informatics, University of New Mexico Health Sciences Center, Albuquerque, NM, USA.

出版信息

Database (Oxford). 2022 Mar 28;2022. doi: 10.1093/database/baac019.

Abstract

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

摘要

有关哪些基因与哪些疾病有关的科学知识迅速增长,这使得人们很难跟上新的出版物和遗传学数据集。DISEASES 数据库旨在通过系统地整合和为来自经过精心整理的数据库、全基因组关联研究 (GWAS) 和生物医学文献的自动文本挖掘的疾病-基因关联提供置信度评分,从而提供全面的概述。在这里,我们对该资源进行了重大更新,大大增加了所有这些来源的关联数量。对于从文本挖掘中获得的关联尤其如此,所有置信度截止值的关联数量至少增加了 9 倍。我们表明,这种急剧增加主要是由于将全文文章添加到文本语料库中,其次是由于用于命名实体识别的疾病和基因词典的改进,并且仅在很小程度上是由于 PubMed 摘要数量的增加。DISEASES 现在还利用了一个新的 GWAS 数据库,即通过 GWAS 分析进行靶向照明,这大大增加了 GWAS 衍生的疾病-基因关联的数量。DISEASES 本身也集成到其他几个数据库和资源中,包括 GeneCards/MalaCards、Pharos/Target Central Resource Database 和 Cytoscape stringApp。DISEASES 中的所有数据每周都会更新,并可通过 https://diseases.jensenlab.org 上的网络界面访问,也可以根据开放许可证从该界面下载。数据库 URL:https://diseases.jensenlab.org。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09cb/9216524/56d4dabd7df8/baac019f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验