School of Information Science and Engineering (School of Software), Yanshan University, Qinhuangdao 066004, China.
School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac361.
Long noncoding RNAs (lncRNAs) play an important role in the occurrence and development of diseases. Predicting disease-related lncRNAs can help to understand the pathogenesis of diseases deeply. The existing methods mainly rely on multi-source data related to lncRNAs and diseases when predicting the associations between lncRNAs and diseases. There are interdependencies among node attributes in a heterogeneous graph composed of all lncRNAs, diseases and micro RNAs. The meta-paths composed of various connections between them also contain rich semantic information. However, the existing methods neglect to integrate attribute information of intermediate nodes in meta-paths.
We propose a novel association prediction model, GSMV, to learn and deeply integrate the global dependencies, semantic information of meta-paths and node-pair multi-view features related to lncRNAs and diseases. We firstly formulate the global representations of the lncRNA and disease nodes by establishing a self-attention mechanism to capture and learn the global dependencies among node attributes. Second, starting from the lncRNA and disease nodes, respectively, multiple meta-pathways are established to reveal different semantic information. Considering that each meta-path contains specific semantics and has multiple meta-path instances which have different contributions to revealing meta-path semantics, we design a graph neural network based module which consists of a meta-path instance encoding strategy and two novel attention mechanisms. The proposed meta-path instance encoding strategy is used to learn the contextual connections between nodes within a meta-path instance. One of the two new attention mechanisms is at the meta-path instance level, which learns rich and informative meta-path instances. The other attention mechanism integrates various semantic information from multiple meta-paths to learn the semantic representation of lncRNA and disease nodes. Finally, a dilated convolution-based learning module with adjustable receptive fields is proposed to learn multi-view features of lncRNA-disease node pairs. The experimental results prove that our method outperforms seven state-of-the-art comparing methods for lncRNA-disease association prediction. Ablation experiments demonstrate the contributions of the proposed global representation learning, semantic information learning, pairwise multi-view feature learning and the meta-path instance encoding strategy. Case studies on three cancers further demonstrate our method's ability to discover potential disease-related lncRNA candidates.
zhang@hlju.edu.cn or peiliangwu@ysu.edu.cn.
Supplementary data are available at Briefings in Bioinformatics online.
长非编码 RNA(lncRNA)在疾病的发生和发展中起着重要作用。预测与疾病相关的 lncRNA 有助于深入了解疾病的发病机制。现有的方法主要依赖于与 lncRNA 和疾病相关的多源数据来预测 lncRNA 与疾病之间的关联。由所有 lncRNA、疾病和 microRNA 组成的异构图中的节点属性之间存在相互依赖关系。由它们之间的各种连接组成的元路径也包含丰富的语义信息。然而,现有的方法忽略了整合元路径中中间节点的属性信息。
我们提出了一种新的关联预测模型 GSMV,用于学习和深度整合与 lncRNA 和疾病相关的全局依赖性、元路径的语义信息以及 lncRNA 和疾病节点对的多视图特征。我们首先通过建立自注意力机制来对 lncRNA 和疾病节点的全局表示进行公式化,以捕获和学习节点属性之间的全局依赖性。其次,从 lncRNA 和疾病节点开始,分别建立多个元路径,以揭示不同的语义信息。考虑到每个元路径都包含特定的语义,并且有多个元路径实例,这些实例对揭示元路径语义有不同的贡献,我们设计了一个基于图神经网络的模块,该模块由元路径实例编码策略和两个新的注意力机制组成。所提出的元路径实例编码策略用于学习元路径实例内节点之间的上下文连接。两个新的注意力机制之一是在元路径实例级别,用于学习丰富的信息丰富的元路径实例。另一个注意力机制整合来自多个元路径的各种语义信息,以学习 lncRNA 和疾病节点的语义表示。最后,提出了一个具有可调节感受野的扩张卷积学习模块,用于学习 lncRNA-疾病节点对的多视图特征。实验结果证明,我们的方法在 lncRNA-疾病关联预测方面优于七种最先进的比较方法。消融实验证明了我们提出的全局表示学习、语义信息学习、节点对多视图特征学习和元路径实例编码策略的贡献。对三种癌症的案例研究进一步证明了我们的方法发现潜在疾病相关 lncRNA 候选物的能力。
zhang@hlju.edu.cn 或 peiliangwu@ysu.edu.cn。
补充数据可在Briefings in Bioinformatics 在线获取。