Suppr超能文献

使用多种机器学习模型预测乳腺癌患者的基因特征。

Predicting gene signature in breast cancer patients with multiple machine learning models.

作者信息

Zhu Fangfang, Xu Dafang

机构信息

First Affiliated Hospital of Huzhou University, No.158, Guangchang Hou Road, Huzhou, 313000, Zhejiang, People's Republic of China.

出版信息

Discov Oncol. 2024 Oct 1;15(1):516. doi: 10.1007/s12672-024-01386-2.

Abstract

AIMS

The aim of this study was to predict gene signatures in breast cancer patients using multiple machine learning models.

METHODS

In this study, we first collated and merged the datasets GSE54002 and GSE22820, obtaining a gene expression matrix comprising 16,820 genes (including 593 breast cancer (BC) samples and 26 normal control (NC) samples). Subsequently, we performed enrichment analyses using Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Disease Ontology (DO).

RESULTS

We identified 177 differentially expressed genes (DEGs), including 40 up-regulated and 137 down-regulated genes, through differential expression analysis. The GO enrichment results indicated that these genes are primarily involved in extracellular matrix organization, positive regulation of nervous system development, collagen-containing extracellular matrix, heparin binding, glycosaminoglycan binding, and Wnt protein binding, among others. KEGG enrichment analysis revealed that the DEGs were primarily associated with pathways such as focal adhesion, the PI3K-Akt signaling pathway, and human papillomavirus infection. DO enrichment analysis showed that the DEGs play a significant role in regulating diseases such as intestinal disorders, nephritis, and dermatitis. Further, through LASSO regression analysis and SVM-RFE algorithm analysis, we identified 9 key feature DEGs (CF-DEGs): ANGPTL7, TSHZ2, SDPR, CLCA4, PAMR1, MME, CXCL2, ADAMTS5, and KIT. Additionally, ROC curve analysis demonstrated that these CF-DEGs serve as a reliable diagnostic index. Finally, using the CIBERSORT algorithm, we analyzed the infiltration of immune cells and the associations between CF-DEGs and immune cell infiltration across all samples.

CONCLUSIONS

Our findings provide new insights into the molecular functions and metabolic pathways involved in breast cancer, potentially aiding in the discovery of new diagnostic and immunotherapeutic biomarkers.

摘要

目的

本研究旨在使用多种机器学习模型预测乳腺癌患者的基因特征。

方法

在本研究中,我们首先整理并合并了数据集GSE54002和GSE22820,获得了一个包含16820个基因的基因表达矩阵(包括593个乳腺癌(BC)样本和26个正常对照(NC)样本)。随后,我们使用基因本体论(GO)、京都基因与基因组百科全书(KEGG)和疾病本体论(DO)进行了富集分析。

结果

通过差异表达分析,我们鉴定出177个差异表达基因(DEG),包括40个上调基因和137个下调基因。GO富集结果表明,这些基因主要参与细胞外基质组织、神经系统发育的正调控、含胶原细胞外基质、肝素结合、糖胺聚糖结合和Wnt蛋白结合等。KEGG富集分析显示,DEG主要与粘着斑、PI3K-Akt信号通路和人乳头瘤病毒感染等途径相关。DO富集分析表明,DEG在调节肠道疾病、肾炎和皮炎等疾病中起重要作用。此外,通过LASSO回归分析和SVM-RFE算法分析,我们确定了9个关键特征DEG(CF-DEG):ANGPTL7、TSHZ2、SDPR、CLCA4、PAMR1、MME、CXCL2、ADAMTS5和KIT。此外,ROC曲线分析表明,这些CF-DEG可作为可靠的诊断指标。最后,使用CIBERSORT算法,我们分析了所有样本中免疫细胞的浸润情况以及CF-DEG与免疫细胞浸润之间的关联。

结论

我们的研究结果为乳腺癌涉及的分子功能和代谢途径提供了新的见解,可能有助于发现新的诊断和免疫治疗生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a69/11445210/6b0bfec8bd43/12672_2024_1386_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验