Suppr超能文献

建立 SVM 分类器预测卵巢癌复发。

Establishment of a SVM classifier to predict recurrence of ovarian cancer.

机构信息

Department of Obstetrics and Gynecology, Xiangyang Central Hospital Affiliated to The Hubei University of Arts and Science, Xiangyang, Hubei 441021, P.R. China.

出版信息

Mol Med Rep. 2018 Oct;18(4):3589-3598. doi: 10.3892/mmr.2018.9362. Epub 2018 Aug 8.

Abstract

Gene expression data using retrieved ovarian cancer (OC) samples were used to identify genes of interest and a support vector machine (SVM) classifier was subsequently established to predict the recurrence of OC. Three datasets (GSE17260, GSE44104 and GSE51088) investigating OC gene expression were downloaded from the Gene Expression Omnibus. Differentially expressed genes (DEGs) in samples from patients with non‑recurrent and recurrent OC were revealed via a homogeneity test and quality control analysis. A protein‑protein interaction (PPI) network was subsequently established for the DEGs using data from Biological General Repository for Interaction Datasets, Human Protein Reference Database and Database of Interacting Proteins. Degrees of interaction and betweenness centrality (BC) scores were calculated for each node in the PPI network. The top 100 genes ranked by BC scores were selected to identify feature genes via recursive feature elimination using the GSE17260 dataset. Following this, a SVM classifier was constructed and further validated using the GSE44104 and GSE51088 datasets and independent gene expression data obtained from the Cancer Genome Atlas (TCGA). A total of 639 DEGs were identified from the three gene expression datasets, and a PPI network including 249 nodes and 354 edges was constructed. A SVM classifier consisting of 39 feature genes (including cullin 3, mouse double minute 2 homolog, aurora kinase A, WW domain containing oxidoreducatase, large tumor suppressor kinase 2, sirtuin 6, staphylococcal nuclease and tudor domain containing 1, leucine rich repeats and immunoglobulin like domains 1 and aurora kinase 1 interacting protein 1) was subsequently constructed. The prediction accuracies of the SVM classifier for GSE17260, GSE44104 and GSE51088 datasets as well as data downloaded from TCGA were revealed to be 92.7, 93.3, 96.6 and 90.4%, respectively. Furthermore, the results of the present study revealed that patients with predicted non‑recurrent OC survived significantly longer compared with the patients with predicted recurrent OC (P=6.598x10‑6). A SVM classifier consisting of 39 feature genes was established for predicting the recurrence and prognosis of OC. Therefore, the results of the present study suggested that the 39 feature genes may serve important roles in the development of OC and may represent therapeutic biomarkers of OC.

摘要

使用检索到的卵巢癌 (OC) 样本的基因表达数据来鉴定感兴趣的基因,并随后建立支持向量机 (SVM) 分类器来预测 OC 的复发。从基因表达综合数据库中下载了三个研究 OC 基因表达的数据集(GSE17260、GSE44104 和 GSE51088)。通过同质性检验和质量控制分析揭示了非复发性和复发性 OC 患者样本中的差异表达基因 (DEGs)。使用来自生物一般交互数据集数据库、人类蛋白质参考数据库和相互作用蛋白数据库的数据,为 DEGs 建立了蛋白质-蛋白质相互作用 (PPI) 网络。计算 PPI 网络中每个节点的相互作用度和介数中心度 (BC) 得分。使用 GSE17260 数据集通过递归特征消除选择排名前 100 的 BC 评分基因作为特征基因。随后,使用 GSE44104 和 GSE51088 数据集以及从癌症基因组图谱 (TCGA) 获得的独立基因表达数据构建并进一步验证 SVM 分类器。从三个基因表达数据集中鉴定出 639 个 DEGs,并构建了一个包含 249 个节点和 354 个边的 PPI 网络。随后构建了一个由 39 个特征基因组成的 SVM 分类器(包括 Cullin3、鼠双微体 2 同源物、极光激酶 A、含 WW 结构域的氧化还原酶、大肿瘤抑制激酶 2、Sirtuin 6、含 Staphylococcal Nuclease 和 Tudor 结构域的 1 蛋白、富含亮氨酸重复和免疫球蛋白样结构域 1 和极光激酶 1 相互作用蛋白 1)。SVM 分类器对 GSE17260、GSE44104 和 GSE51088 数据集以及从 TCGA 下载的数据的预测准确率分别为 92.7%、93.3%、96.6%和 90.4%。此外,本研究的结果表明,预测为非复发性 OC 的患者的生存期明显长于预测为复发性 OC 的患者(P=6.598x10-6)。建立了一个由 39 个特征基因组成的 SVM 分类器,用于预测 OC 的复发和预后。因此,本研究的结果表明,这 39 个特征基因可能在 OC 的发展中起重要作用,并且可能是 OC 的治疗生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5708/6131358/1a61ba5ba4dc/MMR-18-04-3589-g01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验