Suppr超能文献

Harmonizome 3.0:整合来自多种多组学资源的基因和蛋白质知识。

Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources.

作者信息

Diamant Ido, Clarke Daniel J B, Evangelista John Erol, Lingam Nathania, Ma'ayan Avi

机构信息

Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603 New York, NY, USA.

出版信息

Nucleic Acids Res. 2025 Jan 6;53(D1):D1016-D1028. doi: 10.1093/nar/gkae1080.

Abstract

By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene-attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.

摘要

通过将各种组学数据集处理和提炼为基因与其属性之间的关联,Harmonizome数据库使研究人员能够探索和整合来自许多核心组学资源的关于人类基因的知识。在此,我们介绍Harmonizome 3.0,这是对原始Harmonizome数据库的重大升级。此次升级增加了26个数据集,这些数据集贡献了近1200万个基因与各种属性类型(如细胞和组织、疾病以及通路)之间的关联。该升级具有数据集交叉功能,以识别跨数据集共享的基因模块。为了进一步解释数据集对之间显著高的基因集重叠,一个大语言模型(LLM)撰写一段推测高重叠背后原因的段落。此次升级还增加了更多的数据格式和可视化选项。数据集可作为知识图谱(KG)断言进行下载,并使用均匀流形逼近和投影(UMAP)图进行可视化。KG断言可通过一个用户界面进行探索,该界面将基因-属性关联可视化为球棍图。总体而言,Harmonizome 3.0是一个丰富的经过处理的组学数据集资源,以几种适用于人工智能的格式提供。Harmonizome 3.0可在https://maayanlab.cloud/Harmonizome/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7b9/11701526/0a1e78b47fd0/gkae1080figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验