Program in Computational Biology & Bioinformatics, Yale University, New Haven, Connecticut, United States of America.
Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut, United States of America.
PLoS Comput Biol. 2023 Jul 6;19(7):e1011222. doi: 10.1371/journal.pcbi.1011222. eCollection 2023 Jul.
The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.
由 SARS-CoV-2 病毒引起的 COVID-19 大流行已在全球范围内导致数百万人死亡。该疾病表现出多种症状,其严重程度和长期结果各不相同。先前的研究通过揭示病毒感染的机制,为治疗和预防策略的制定做出了贡献。我们现在已经了解了 SARS-CoV-2 感染过程中所有直接的蛋白-蛋白相互作用,但关键是要超越这些已知的相互作用,全面了解 SARS-CoV-2 感染的“全互作组”,其中包括人类 microRNA(miRNA)、其他人类蛋白编码基因和外源性微生物。这可能有助于开发治疗 COVID-19 的新药,区分长 COVID 的细微差别,并确定 SARS-CoV-2 感染器官的组织病理学特征。为了构建全互作组,我们开发了一种称为 MLCrosstalk(多层串扰)的统计建模方法,该方法基于潜在狄利克雷分配。MLCrosstalk 整合了来自多个来源的数据,包括微生物、人类蛋白编码基因、miRNA 和人类蛋白-蛋白相互作用。它根据患者样本中相似的共现模式,将 SARS-CoV-2 与基因和微生物分组为“主题”。我们使用这些主题来推断 SARS-CoV-2 与蛋白编码基因、miRNA 和微生物之间的联系。然后,我们使用网络传播来精炼这些初始联系,将它们置于更大的网络和途径结构框架内。使用 MLCrosstalk,我们确定了与 SARS-CoV-2 相关的 IL1 处理和 VEGFA-VEGFR2 途径中的基因。我们还发现 Rothia mucilaginosa 和 Prevotella melaninogenica 与 SARS-CoV-2 丰度呈正相关和负相关,单细胞测序数据分析证实了这一发现。