Suppr超能文献

基于 DNA 甲基化和基因表达数据的线性回归和深度学习方法在癌症中检测可靠的遗传改变。

A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data.

机构信息

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

Department of Computer Science & Engineering, Aliah University, Newtown WB-700160, India.

出版信息

Genes (Basel). 2020 Aug 12;11(8):931. doi: 10.3390/genes11080931.

Abstract

DNA methylation change has been useful for cancer biomarker discovery, classification, and potential treatment development. So far, existing methods use either differentially methylated CpG sites or combined CpG sites, namely differentially methylated regions, that can be mapped to genes. However, such methylation signal mapping has limitations. To address these limitations, in this study, we introduced a combinatorial framework using linear regression, differential expression, deep learning method for accurate biological interpretation of DNA methylation through integrating DNA methylation data and corresponding TCGA gene expression data. We demonstrated it for uterine cervical cancer. First, we pre-filtered outliers from the data set and then determined the predicted gene expression value from the pre-filtered methylation data through linear regression. We identified differentially expressed genes (DEGs) by Empirical Bayes test using Limma. Then we applied a deep learning method, "" to classify the cervical cancer label of those DEGs to determine all classification metrics including accuracy and area under curve (AUC) through 10-fold cross validation. We applied our approach to uterine cervical cancer DNA methylation dataset (NCBI accession ID: GSE30760, 27,578 features covering 63 tumor and 152 matched normal samples). After linear regression and differential expression analysis, we obtained 6287 DEGs with false discovery rate (FDR) <0.001. After performing deep learning analysis, we obtained average classification accuracy 90.69% (±1.97%) of the uterine cervical cancerous labels. This performance is better than that of other peer methods. We performed in-degree and out-degree hub gene network analysis using Cytoscape. We reported five top in-degree genes (PAIP2, GRWD1, VPS4B, CRADD and LLPH) and five top out-degree genes (MRPL35, FAM177A1, STAT4, ASPSCR1 and FABP7). After that, we performed KEGG pathway and Gene Ontology enrichment analysis of DEGs using tool . In summary, our proposed framework that integrated linear regression, differential expression, deep learning provides a robust approach to better interpret DNA methylation analysis and gene expression data in disease study.

摘要

DNA 甲基化变化已被用于癌症生物标志物的发现、分类和潜在治疗方法的开发。到目前为止,现有的方法要么使用差异甲基化 CpG 位点,要么使用可以映射到基因的差异甲基化区域(即差异甲基化区域)。然而,这种甲基化信号映射存在局限性。为了解决这些局限性,在本研究中,我们引入了一个组合框架,该框架使用线性回归、差异表达和深度学习方法,通过整合 DNA 甲基化数据和相应的 TCGA 基因表达数据,对 DNA 甲基化进行准确的生物学解释。我们用它来研究子宫颈癌。首先,我们从数据集中剔除了异常值,然后通过线性回归从预过滤的甲基化数据中确定预测的基因表达值。我们使用 Limma 的经验贝叶斯检验确定差异表达基因(DEGs)。然后,我们应用一种深度学习方法,“”来对这些 DEGs 的宫颈癌标签进行分类,通过 10 倍交叉验证确定所有分类指标,包括准确性和曲线下面积(AUC)。我们将我们的方法应用于子宫颈癌 DNA 甲基化数据集(NCBI 访问号:GSE30760,包含 63 个肿瘤和 152 个匹配正常样本的 27578 个特征)。在线性回归和差异表达分析后,我们得到了 6287 个 FDR<0.001 的 DEGs。在进行深度学习分析后,我们获得了子宫颈癌标签的平均分类准确率为 90.69%(±1.97%)。该性能优于其他同类方法。我们使用 Cytoscape 进行了入度和出度枢纽基因网络分析。我们报告了五个 top 入度基因(PAIP2、GRWD1、VPS4B、CRADD 和 LLPH)和五个 top 出度基因(MRPL35、FAM177A1、STAT4、ASPSCR1 和 FABP7)。之后,我们使用工具对 DEGs 进行了 KEGG 通路和基因本体论富集分析。总之,我们提出的集成线性回归、差异表达、深度学习的框架为更好地解释疾病研究中的 DNA 甲基化分析和基因表达数据提供了一种强大的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae65/7465138/e51a21f31793/genes-11-00931-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验