Cao Wanchen, Gao Kai, Zhao Yi
School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China.
Front Genet. 2025 May 30;16:1536198. doi: 10.3389/fgene.2025.1536198. eCollection 2025.
Papillary thyroid carcinoma (PTC) has a high recurrence rate and lacks reliable diagnostic biomarkers. This study aims to identify robust transcriptomic biomarkers for PTC diagnosis through integrative bioinformatics approaches and elucidate the cellular mechanisms underlying PTC pathogenesis at single-cell resolution.
Based on the Gene Expression Omnibus (GEO) database, we downloaded PTC-related RNA-seq datasets (GSE3467, GSE3678, GSE33630, GSE65144, and GSE82208) and an scRNA-seq dataset (GSE191288). Among these, the RNA-seq dataset GSE3467 was used as the training dataset to perform differential gene expression analysis, GO and KEGG enrichment analyses, weighted gene co-expression network analysis (WGCNA), machine learning, ROC analysis, nomogram analysis, and GSEA for mining potential biomarkers. The remaining RNA-seq datasets (GSE3678, GSE33630, GSE65144, and GSE82208) were used as the validation datasets to validate these potential biomarkers. Based on the results from potential biomarker mining, the scRNA-seq dataset (GSE191288) was used to analyze and uncover key cell types and their mechanisms involved in the occurrence and development of PTC.
This study retrieved relevant PTC datasets from the GEO database and identified three biomarkers (ENTPD1, SERPINA1, and TACSTD2) through a series of bioinformatics analyses. GSEA suggested that these biomarkers may be involved in the occurrence and development of PTC by collectively regulating the cytokine-cytokine receptor interaction pathways. scRNA-seq analysis revealed tissue stem cells, epithelial cells, and smooth muscle cells as key cell types in PTC. Cell-cell communication analysis revealed that epithelial cells primarily interact with tissue stem cells and smooth muscle cells through two ligand-receptor pairs, namely, COL4A1-CD4 and COL4A2-CD4. The collagen signaling pathway was identified as the most dominant pathway, and violin plots demonstrated that ligands COL4A1 and COL4A2 were highly expressed in epithelial cells, while the receptor CD4 showed elevated expression in both tissue stem cells and smooth muscle cells. Pseudotime analysis demonstrated that these three cell types underwent three distinct differentiation stages, during which the expression levels of the biomarkers ENTPD1, SERPINA1, and TACSTD2 showed stage-specific trends.
In summary, this study combines RNA-seq and scRNA-seq analysis techniques to identify ENTPD1, SERPINA1, and TACSTD2 as potential biomarkers for PTC at the transcriptomic level and tissue stem cells, epithelial cells, and smooth muscle cells as key cells in PTC at the cellular level. This study conducted in-depth research and analysis on these potential biomarkers and key cells, providing new research foundations and insights for future basic experimental research and the diagnosis and treatment of PTC in clinical settings.
甲状腺乳头状癌(PTC)复发率高且缺乏可靠的诊断生物标志物。本研究旨在通过整合生物信息学方法鉴定用于PTC诊断的强大转录组生物标志物,并在单细胞分辨率下阐明PTC发病机制的细胞机制。
基于基因表达综合数据库(GEO),我们下载了PTC相关的RNA测序数据集(GSE3467、GSE3678、GSE33630、GSE65144和GSE82208)以及一个单细胞RNA测序数据集(GSE191288)。其中,RNA测序数据集GSE3467用作训练数据集,进行差异基因表达分析、GO和KEGG富集分析、加权基因共表达网络分析(WGCNA)、机器学习、ROC分析、列线图分析和基因集富集分析(GSEA)以挖掘潜在生物标志物。其余RNA测序数据集(GSE3678、GSE33630、GSE65144和GSE82208)用作验证数据集以验证这些潜在生物标志物。基于潜在生物标志物挖掘的结果,单细胞RNA测序数据集(GSE191288)用于分析并揭示参与PTC发生发展的关键细胞类型及其机制。
本研究从GEO数据库中检索了相关PTC数据集,并通过一系列生物信息学分析鉴定出三种生物标志物(ENTPD1、SERPINA1和TACSTD2)。GSEA表明,这些生物标志物可能通过共同调节细胞因子 - 细胞因子受体相互作用途径参与PTC的发生发展。单细胞RNA测序分析揭示组织干细胞、上皮细胞和平滑肌细胞是PTC中的关键细胞类型。细胞间通讯分析表明,上皮细胞主要通过两个配体 - 受体对,即COL4A1 - CD4和COL4A2 - CD4与组织干细胞和平滑肌细胞相互作用。胶原信号通路被确定为最主要的通路,小提琴图显示配体COL4A1和COL4A2在上皮细胞中高表达,而受体CD4在组织干细胞和平滑肌细胞中均有升高表达。伪时间分析表明这三种细胞类型经历了三个不同的分化阶段,在此期间生物标志物ENTPD1、SERPINA1和TACSTD2的表达水平呈现出阶段特异性趋势。
总之,本研究结合RNA测序和单细胞RNA测序分析技术,在转录组水平上鉴定出ENTPD1、SERPINA1和TACSTD2作为PTC的潜在生物标志物,在细胞水平上鉴定出组织干细胞、上皮细胞和平滑肌细胞是PTC中的关键细胞。本研究对这些潜在生物标志物和关键细胞进行了深入的研究和分析,为未来的基础实验研究以及PTC临床诊断和治疗提供了新的研究基础和见解。