Wei Kai, Qian Fang, Li Yixue, Zeng Tao, Huang Tao
Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
Guoke Ningbo Life Science and Health Industry Research Institute, Ningbo 315000, China.
Fundam Res. 2024 Apr 2;4(4):738-751. doi: 10.1016/j.fmre.2024.03.022. eCollection 2024 Jul.
Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity. The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic models of childhood asthma. To capture the nonlinear association of multi-omics data and improve interpretability of diagnostic model, we proposed a novel deep association model (DAM) and corresponding efficient analysis framework. First, the Deep Subspace Reconstruction was used to fuse the omics data and diagnostic information, thereby correcting the distribution of the original omics data and reducing the influence of unnecessary data noises. Second, the Joint Deep Semi-Negative Matrix Factorization was applied to identify different latent sample patterns and extract biomarkers from different omics data levels. Third, our newly proposed Deep Orthogonal Canonical Correlation Analysis can rank features in the collaborative module, which are able to construct the diagnostic model considering nonlinear correlation between different omics data levels. Using DAM, we deeply analyzed the transcriptome and methylation data of childhood asthma. The effectiveness of DAM is verified from the perspectives of algorithm performance and biological significance on the independent test dataset, by ablation experiment and comparison with many baseline methods from clinical and biological studies. The DAM-induced diagnostic model can achieve a prediction AUC of 0.912, which is higher than that of many other alternative methods. Meanwhile, relevant pathways and biomarkers of childhood asthma are also recognized to be collectively altered on the gene expression and methylation levels. As an interpretable machine learning approach, DAM simultaneously considers the non-linear associations among samples and those among biological features, which should help explore interpretative biomarker candidates and efficient diagnostic models from multi-omics data analysis for human complex diseases.
儿童哮喘是最常见的呼吸道疾病之一,其死亡率和发病率不断上升。多组学数据为探索儿童哮喘的协同生物标志物及相应诊断模型提供了新契机。为捕捉多组学数据的非线性关联并提高诊断模型的可解释性,我们提出了一种新型深度关联模型(DAM)及相应的高效分析框架。首先,利用深度子空间重构融合组学数据和诊断信息,从而校正原始组学数据的分布并减少不必要数据噪声的影响。其次,应用联合深度半负矩阵分解来识别不同的潜在样本模式,并从不同组学数据层面提取生物标志物。第三,我们新提出的深度正交典型相关分析能够对协同模块中的特征进行排序,从而构建考虑不同组学数据层面之间非线性相关性的诊断模型。利用DAM,我们深入分析了儿童哮喘的转录组和甲基化数据。通过消融实验以及与临床和生物学研究中的许多基线方法进行比较,从算法性能和生物学意义的角度在独立测试数据集上验证了DAM的有效性。DAM诱导的诊断模型能够实现0.912的预测AUC,高于许多其他替代方法。同时,儿童哮喘的相关通路和生物标志物在基因表达和甲基化水平上也被认为是共同改变的。作为一种可解释的机器学习方法,DAM同时考虑了样本之间以及生物学特征之间的非线性关联,这有助于从多组学数据分析中探索用于人类复杂疾病的可解释生物标志物候选物和高效诊断模型。