Diamant Ido, Clarke Daniel J B, Evangelista John Erol, Lingam Nathania, Ma'ayan Avi
Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603 New York, NY, USA.
Nucleic Acids Res. 2025 Jan 6;53(D1):D1016-D1028. doi: 10.1093/nar/gkae1080.
By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene-attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.
通过将各种组学数据集处理和提炼为基因与其属性之间的关联,Harmonizome数据库使研究人员能够探索和整合来自许多核心组学资源的关于人类基因的知识。在此,我们介绍Harmonizome 3.0,这是对原始Harmonizome数据库的重大升级。此次升级增加了26个数据集,这些数据集贡献了近1200万个基因与各种属性类型(如细胞和组织、疾病以及通路)之间的关联。该升级具有数据集交叉功能,以识别跨数据集共享的基因模块。为了进一步解释数据集对之间显著高的基因集重叠,一个大语言模型(LLM)撰写一段推测高重叠背后原因的段落。此次升级还增加了更多的数据格式和可视化选项。数据集可作为知识图谱(KG)断言进行下载,并使用均匀流形逼近和投影(UMAP)图进行可视化。KG断言可通过一个用户界面进行探索,该界面将基因-属性关联可视化为球棍图。总体而言,Harmonizome 3.0是一个丰富的经过处理的组学数据集资源,以几种适用于人工智能的格式提供。Harmonizome 3.0可在https://maayanlab.cloud/Harmonizome/获取。